I'm trying to figure out what's going wrong with https://github.com/mit-plv/fiat-crypto/pull/1350. src/PushButtonSynthesis/WordByWordMontgomeryReificationCache.v takes about twice as long to compile than it did before. Almost all the time is spent in Ltac that constructs constrs, so the Ltac profiler is not much help. I'm not really sure how to make sense of the data from
perf record. (The change I made was to use
transparent abstract to deduplicate reification work, so I'm guessing that either it's spending more time checking the cache (overhead would be underneath
transparent_abstract), or looking up things in the cache (overhead would be underneath constr equality checking /
match term with term), or maybe it's spending more time typechecking (overhead in
constr:(...)) or creating / unifying / resolving evars. I don't know how to read this off the perf data, though...)
The slower version looks like
while the faster version looks like
(cc @Pierre-Marie Pédrot )
I think the second screenshot is not a
perf report, is it? Is that a mistake?
Oops, fixed, thanks @Janno
It's hard to say what is going on in that perf profile with all these unknown functions. I am not sure what the reason for this is. The profile should have useful names for all functions in it.
Maybe this version of Coq or OCaml was somehow compiled without complete debug info?
objdump --syms $(which coqc) lists a bunch of things, but there are no symbols for ocamlc / ocamlopt
(This is on Arch)
This looks like bogus stack traces. Did you forget to pass
Oh, I wasn't aware that I need to pass
--call-graph=dwarf. I was just doing
perf record -g
--call-graph=dwarf does not help
@Jason Gross you need to use a version of the OCaml compiler with the fp flag for the callstack to be meaningful.
I personally use ocaml-variants.4.09.1+fp on a daily basis.
Last updated: Dec 01 2023 at 07:01 UTC