(Sorry for the click-bait topic title!) I have two seemingly identical proof terms, one generated through Ltac2 and one through Mtac2, that have very different
Qed performance behavior: the one from Mtac2 (let's call that term
M) exhibits quadratic scaling (w.r.t. to my work factor) whereas the term generated via Ltac2 (let's call that
L) scales linearly (or close to it). They print identically. How do I figure out what the actual difference is?
Here are some things that might help people point me in the right direction:
Optimize Proofreports a linear number of evars for
Mbut only a constant number of evars for
L. However, calling
Optimize Proofdoes not make
Qedany faster for
M. Not sure what that indicates. (I have very similar examples where
Qedof Mtac2 proof terms is exactly as fast as for Ltac2 despite the same difference in evars.)
Qedlook very similar in percentages except that
Mspends significant amount of time (13%) in
sweep_sliceand correspondingly a bit less in every other function.
IIRC Set Printing All does not print the type of variables bound by match branches. Each branch starts with as many lambdas as constructor arguments. I would try to print all match branches in isolation to rule out this.
Interesting. Do I have to write my own printing code to print those branches or is there a function for that somewhere?
What about the usual suspects, i.e. universes?
They might get printed the same but the underlying representation can be different if they haven't been normalized upfront
sweep_slice is just the GC btw, the relevant data would be the callers of that function
Oh, I had forgotten about universes. Indeed, they use different universe assignments and (as usual) Mtac2 generates a gazillion universes, of which almost none end up in the proof term. In fact, Mtac2 re-uses a bunch of global universes in the proof term whereas Ltac2 generates fresh ones. Sadly, universe-related functions basically don't show up in the profile.
If you give here the callers of
sweep_slice we might get an idea of the culprit(s)
I am trying but OCaml is defeating me.. this is the best I can do: https://share.firefox.dev/30I6j1m
I don't think there is anything useful in that flame chart.. The stack samples are almost all broken because the stack is far too large for the maximum stack size of
it's not loading :/
Yeah, it's rather big.. In any case, it seems the firefox profiler discards broken samples so this won't help much
Let's see what I can do locally
Oh, this explains a lot:
sweep_slice only shows up in my initial profile but it doesn't show up any more :(
Yeah, depending on how I backtrack through the file either the Ltac2 or the Mtac2 version ends up with a sizeable GC item in
sweep_slice). Is it worth investigating this?
I am still trying to learn anything from these (broken) perf profiles. One thing that seems interesting is that the Mtac2 version of the proof term spends a lot of time in
Typeops.check_cast whereas the Ltac2 version doesn't seem to be calling either function enough for it to show up in the perf samples. Does this point to anything?
Neither version of the term has any casts as far as I can tell from printing it with
Set Printing All. (They really look 100% identical when printed except for universes.)
check_cast is also used for let-ins and primitive types. Do you have let-ins holding a term in Type? (I just discovered the type of that term is not always printed)
I mean the term contructor for let holds a type. if its sort is Type, then Coq pretty printer may hide it.
I think I there is not a single let binding in that proof term
Or, rather, in its type
grep for yourself https://github.com/coq/coq/blob/master/kernel/typeops.ml
Yeah it's only casts and let-ins :(
I could see how casts could sneak in with one tactic language versus the other.. but let bindings? in a term that prints identically for both tactic languages?
Ugh.. as usual, the problem is me. There was a difference between the proof terms (one innocent little
') and I just didn't see it yesterday. I'll report back if the difference persists after I fix this. Thanks for your help!
Okay, so the proof terms were different but fixing the difference doesn't change anything. In fact even if I just skip Ltac2 altogether and simply use
refine M (
M being the proof term generated by Mtac2) in a fresh
Goal I observe fast
Qed performance (as with Ltac2). But in the goal where I take
M from (via
Qed is slow.
Even if I execute the Mtac2 tactic on a freshly generated evar (i.e. not on the current goal) to litter the proof state with universes and then solve the goal with
refine M I get fast
Show Universes say there are a lot of universes just before Qed?
Yes, both when calling the Mtac2 tactic without using the result and when calling it on the actual goal.
@Enrico Tassi was maybe right then, it could well be some hidden stuff in the branches. Do you use inductive types with let-bindings in them?
Pattern-matching branches are highly redundant so if you're unlucky enough to reduce the type of their arguments you're going to have a bad time.
(see e.g. https://github.com/coq/coq/pull/9710)
No let bindings as far as I can tell; neither in the proof term nor its type nor in the (inductive) types it uses.
Not even in the return type of a pattern-matching?
I've had a similar problem once and it's infuriating, because there is literally no way to see the difference from user land
you have to dump the term in OCaml
Oh, there actually is a let binding hidden in a
should I write that out manually as a match to make sure the match is annotated in the simplest way possible?
You may definitely try.
No difference.. I suppose OCaml it is then.
(No difference as in: there is still a difference.)
just load the debugger and set a breakpoint in Typeops
I don't think I have a
coqtop.byte available in this switch :(
Okay, I used
Termops.Internal.debug_print_constr_env to get a slightly more detailed view at the proof term and it really is 100% identical to what I feed to
refine. I think that means the slowdown has to come from something that happens during Mtac2 tactic execution but only when the result of the tactic is actually used to solve the goal.
Although that doesn't explain difference in the perf profiles (if those still exist; I need to make new profiles).
That's the printer that outputs the horrendous stuff, right?
Like, bound variables are written #0 and the like?
(This printer can be overloaded IIRC, so you better be sure it's giving you full debug info)
I don't know about bound variables but it outputs
Constr(Coq.Init.Datatypes.nat,0,2) instead of
Yes, looks like it.
Ah right I see the variables now. Indeed it uses
Oh, actually, maybe I do see something in the debug output. I thought it had printed the type of the term but maybe there is actually a cast in there. casts don't have nodes like
Constr, do they?
Okay, the code tells me that casts are printed as
(_ : _). It seems there really is a cast hiding in my proof term.
That's good news.. or, rather, it is at least consistent with the
Yes, this debug printer is a bit too unstructured
It'd be nice to belch sexprs for diffing
I think I was just too careless.. "big thing at the end of the term that follows a
:? must be the type of the whole thing!"
I suppose I am not very good a counting parentheses :D
The casts are gone and the proof term is now checked as fast as it should be! :) Thanks so much for everyone's help!
Last updated: Dec 07 2023 at 06:38 UTC