(Sorry for the click-bait topic title!) I have two seemingly identical proof terms, one generated through Ltac2 and one through Mtac2, that have very different Qed
performance behavior: the one from Mtac2 (let's call that term M
) exhibits quadratic scaling (w.r.t. to my work factor) whereas the term generated via Ltac2 (let's call that L
) scales linearly (or close to it). They print identically. How do I figure out what the actual difference is?
Here are some things that might help people point me in the right direction:
Optimize Proof
reports a linear number of evars for M
but only a constant number of evars for L
. However, calling Optimize Proof
does not make Qed
any faster for M
. Not sure what that indicates. (I have very similar examples where Qed
of Mtac2 proof terms is exactly as fast as for Ltac2 despite the same difference in evars.)Qed
look very similar in percentages except that Qed
on M
spends significant amount of time (13%) in sweep_slice
and correspondingly a bit less in every other function.IIRC Set Printing All does not print the type of variables bound by match branches. Each branch starts with as many lambdas as constructor arguments. I would try to print all match branches in isolation to rule out this.
Interesting. Do I have to write my own printing code to print those branches or is there a function for that somewhere?
What about the usual suspects, i.e. universes?
They might get printed the same but the underlying representation can be different if they haven't been normalized upfront
sweep_slice
is just the GC btw, the relevant data would be the callers of that function
Oh, I had forgotten about universes. Indeed, they use different universe assignments and (as usual) Mtac2 generates a gazillion universes, of which almost none end up in the proof term. In fact, Mtac2 re-uses a bunch of global universes in the proof term whereas Ltac2 generates fresh ones. Sadly, universe-related functions basically don't show up in the profile.
If you give here the callers of sweep_slice
we might get an idea of the culprit(s)
I am trying but OCaml is defeating me.. this is the best I can do: https://share.firefox.dev/30I6j1m
I don't think there is anything useful in that flame chart.. The stack samples are almost all broken because the stack is far too large for the maximum stack size of perf
:(
it's not loading :/
Yeah, it's rather big.. In any case, it seems the firefox profiler discards broken samples so this won't help much
Let's see what I can do locally
Oh, this explains a lot: sweep_slice
only shows up in my initial profile but it doesn't show up any more :(
Yeah, depending on how I backtrack through the file either the Ltac2 or the Mtac2 version ends up with a sizeable GC item in perf
(mark_slice
or sweep_slice
). Is it worth investigating this?
I am still trying to learn anything from these (broken) perf profiles. One thing that seems interesting is that the Mtac2 version of the proof term spends a lot of time in Typeops.execute_is_type
and Typeops.check_cast
whereas the Ltac2 version doesn't seem to be calling either function enough for it to show up in the perf samples. Does this point to anything?
Neither version of the term has any casts as far as I can tell from printing it with Set Printing All
. (They really look 100% identical when printed except for universes.)
check_cast
is also used for let-ins and primitive types. Do you have let-ins holding a term in Type? (I just discovered the type of that term is not always printed)
I mean the term contructor for let holds a type. if its sort is Type, then Coq pretty printer may hide it.
I think I there is not a single let binding in that proof term
Or, rather, in its type
grep for yourself https://github.com/coq/coq/blob/master/kernel/typeops.ml check_cast
;-)
Yeah it's only casts and let-ins :(
I could see how casts could sneak in with one tactic language versus the other.. but let bindings? in a term that prints identically for both tactic languages?
Ugh.. as usual, the problem is me. There was a difference between the proof terms (one innocent little '
) and I just didn't see it yesterday. I'll report back if the difference persists after I fix this. Thanks for your help!
Okay, so the proof terms were different but fixing the difference doesn't change anything. In fact even if I just skip Ltac2 altogether and simply use refine M
(M
being the proof term generated by Mtac2) in a fresh Goal
I observe fast Qed
performance (as with Ltac2). But in the goal where I take M
from (via Show Proof
) Qed
is slow.
Even if I execute the Mtac2 tactic on a freshly generated evar (i.e. not on the current goal) to litter the proof state with universes and then solve the goal with refine M
I get fast Qed
..
does Show Universes
say there are a lot of universes just before Qed?
Yes, both when calling the Mtac2 tactic without using the result and when calling it on the actual goal.
@Enrico Tassi was maybe right then, it could well be some hidden stuff in the branches. Do you use inductive types with let-bindings in them?
Pattern-matching branches are highly redundant so if you're unlucky enough to reduce the type of their arguments you're going to have a bad time.
(see e.g. https://github.com/coq/coq/pull/9710)
No let bindings as far as I can tell; neither in the proof term nor its type nor in the (inductive) types it uses.
Not even in the return type of a pattern-matching?
I've had a similar problem once and it's infuriating, because there is literally no way to see the difference from user land
you have to dump the term in OCaml
Oh, there actually is a let binding hidden in a fun '{|..|}
.
should I write that out manually as a match to make sure the match is annotated in the simplest way possible?
You may definitely try.
No difference.. I suppose OCaml it is then.
(No difference as in: there is still a difference.)
just load the debugger and set a breakpoint in Typeops
I don't think I have a coqtop.byte
available in this switch :(
(deleted)
Okay, I used Termops.Internal.debug_print_constr_env
to get a slightly more detailed view at the proof term and it really is 100% identical to what I feed to refine
. I think that means the slowdown has to come from something that happens during Mtac2 tactic execution but only when the result of the tactic is actually used to solve the goal.
Although that doesn't explain difference in the perf profiles (if those still exist; I need to make new profiles).
That's the printer that outputs the horrendous stuff, right?
Yes :)
Like, bound variables are written #0 and the like?
(This printer can be overloaded IIRC, so you better be sure it's giving you full debug info)
I don't know about bound variables but it outputs Constr(Coq.Init.Datatypes.nat,0,2)
instead of S
.
Yes, looks like it.
Ah right I see the variables now. Indeed it uses #
.
Oh, actually, maybe I do see something in the debug output. I thought it had printed the type of the term but maybe there is actually a cast in there. casts don't have nodes like Constr
, do they?
Okay, the code tells me that casts are printed as (_ : _)
. It seems there really is a cast hiding in my proof term.
That's good news.. or, rather, it is at least consistent with the perf
profile
Yes, this debug printer is a bit too unstructured
It'd be nice to belch sexprs for diffing
I think I was just too careless.. "big thing at the end of the term that follows a :
? must be the type of the whole thing!"
I suppose I am not very good a counting parentheses :D
The casts are gone and the proof term is now checked as fast as it should be! :) Thanks so much for everyone's help!
Last updated: Sep 09 2024 at 05:02 UTC