I'm trying to figure out what's going wrong with https://github.com/mit-plv/fiat-crypto/pull/1350. src/PushButtonSynthesis/WordByWordMontgomeryReificationCache.v takes about twice as long to compile than it did before. Almost all the time is spent in Ltac that constructs constrs, so the Ltac profiler is not much help. I'm not really sure how to make sense of the data from perf record
. (The change I made was to use transparent abstract
to deduplicate reification work, so I'm guessing that either it's spending more time checking the cache (overhead would be underneath abstract
/transparent_abstract
), or looking up things in the cache (overhead would be underneath constr equality checking / match term with term
), or maybe it's spending more time typechecking (overhead in constr:(...)
) or creating / unifying / resolving evars. I don't know how to read this off the perf data, though...)
The slower version looks like
image.png
while the faster version looks like
image.png
(cc @Pierre-Marie Pédrot )
I think the second screenshot is not a perf report
, is it? Is that a mistake?
Oops, fixed, thanks @Janno
It's hard to say what is going on in that perf profile with all these unknown functions. I am not sure what the reason for this is. The profile should have useful names for all functions in it.
Maybe this version of Coq or OCaml was somehow compiled without complete debug info?
objdump --syms $(which coqc)
lists a bunch of things, but there are no symbols for ocamlc / ocamlopt
(This is on Arch)
This looks like bogus stack traces. Did you forget to pass --call-graph=dwarf
to perf record
?
Oh, I wasn't aware that I need to pass --call-graph=dwarf
. I was just doing perf record -g
--call-graph=dwarf
does not help
image.png
@Jason Gross you need to use a version of the OCaml compiler with the fp flag for the callstack to be meaningful.
I personally use ocaml-variants.4.09.1+fp on a daily basis.
Last updated: Dec 01 2023 at 07:01 UTC