@Rodolphe Lepigre and I have been debugging the performance regression in https://github.com/coq/coq/pull/17839 and we are pretty much out of ideas at this point. There are some obvious candidates that could responsible:
knht
needs to find applications of the new reduction primitives before their arguments are pushed to the stack. This is done in knht_app
. We already optimized this a lot by reducing the lookup to a hashconsed comparison on Constant.t
. That cut down the initial slowdown by half. The remaining slowdown in knht
/knht_app
does not seem to be due to the extra work being performed.fterm
and stack_member
constructors carrying a new mode
value (indicating how much reduction needs to be performed).to_constr
may end up calling the reduction machine on some subterms. One consequence is that to_constr
needs to carry around info
and tab
and that blows up memory usage from the lazy to_constr
calls in comp_subs
since the closures now have a bigger enviroment (by a factor of two). We could combine info
and tab
to slightly reduce the overhead if there's no better way. There's also an additional allocation for remembering mode
s that are needed when we call the reduction machine.@Pierre-Marie Pédrot, @Gaëtan Gilbert, do you have any more ideas? Would anyone be willing to take a closer look? We can generate various kinds of profiles and flamegraphs and memory allocation reports..
For profiling purposes we've added test-suite/success/bench_to_constr.v
and bench_reduction.v
which try to stress to_constr
and the reduction machine. Both show a measurable overhead.
maybe worth trying to do individual changes to test hypotheses
for instance
There is memory overhead from some fterm and stack_member constructors carrying a new mode value (indicating how much reduction needs to be performed).
you can try doing a commit on top of master with just a dummy unit
mode, or maybe bool
always constructed with true
and every place where you would check it you assert false
when it's false
(I haven't really looked at the diff for that PR yet btw)
mode
is already an integer so that by itself doesn't allocate but the fconstr
and stack_member
nodes will be a little wider
But we haven't tried adding things on top of master. That's a good idea
Last updated: Oct 13 2024 at 01:02 UTC