Finally I also came across a case of slow QED I can't easily handle. I am verifying with VST a few C functions which essentially compute an approximation for a function with different ranges - larger ranges requiring more complex implementations. The functions are all plain forward processing with calls to abstractions of machine intrinsics - no loops - no ifs - no complicated separation logic - no complicated dependent types. Everything which is not just pure C forward processing logic I moved into separate lemmas already.

My timings are:

```
simple
assignments = 20
calls = 8
QED = 9s
medium
assignments = 34
calls = 13
QED = 170s
complex
assignments = 48
calls = 19
QED > 11h
```

The time it takes to construct the proof term is fairly proportional to the C complexity. But for QED I get more than 20x the time for less than 2x the complexity - maybe this is 5th order or more (need to wait for the third QED to finish ...). I have done much more complex stuff which was pretty instant in QED. I am a bit at a loss to understand where this highly non proportional increase in QED complexity comes from. I moved everything which is not just plain forward C processing into separate lemmas already.

The resulting terms get a bit lengthy and in the end I rewrite with a separate lemma which proves the equality between the computed term and the term from the spec. The size of these terms is above average but not that bad (say 1/2 A4 page printed). Could this equality be the issue - as I said the equality itself is proven in a separate lemma and I just apply it.

What are the strategies to debug this? Does it make sense to dump the proof term before QED and make statistics on it?

What processes in type checking the proof term are strongly non proportional and which proof constructs lead to them?

The most common functions in a few 1000 stack dumps are:

```
Total number in stack (recursive counted multiple, when >=5):
44959 camlReduction__ccnv_5036 (in coqc) + 266 [0x1102ea2ca]
24817 camlCArray__fold_1302 (in coqc) + 181 [0x110386805]
20351 camlReduction__eqappr_5037 (in coqc) + 2324 [0x1102eabf4]
19899 camlReduction__cmp_rec_5570 (in coqc) + 58 [0x1102edcaa]
19640 camlReduction__cmp_rec_5570 (in coqc) + 762 [0x1102edf6a]
19634 camlCArray__fold_1331 (in coqc) + 249 [0x110386d09]
16527 camlTypeops__execute_5551 (in coqc) + 1540 [0x11032b534]
2072 camlTypeops__execute_5551 (in coqc) + 490 [0x11032b11a]
1396 camlTypeops__fun_7619 (in coqc) + 29 [0x11032c69d]
1395 camlCArray__map_i_2680 (in coqc) + 192 [0x1103890f0]
1394 camlTypeops__execute_array_5554 (in coqc) + 163 [0x11032c633]
1393 camlTypeops__execute_5551 (in coqc) + 1769 [0x11032b619]
812 camlCClosure__whd_stack_7172 (in coqc) + 316 [0x1102e59dc]
695 camlReduction__eqappr_5037 (in coqc) + 181 [0x1102ea395]
615 camlReduction__eqappr_5037 (in coqc) + 236 [0x1102ea3cc]
365 camlCClosure__whd_stack_7172 (in coqc) + 295 [0x1102e59c7]
328 camlTypeops__execute_5551 (in coqc) + 1140 [0x11032b3a4]
262 camlReduction__cmp_rec_5570 (in coqc) + 686 [0x1102edf1e]
183 camlReduction__eqappr_5037 (in coqc) + 8125 [0x1102ec29d]
171 camlCClosure__knit_6935 (in coqc) + 27 [0x1102e432b]
145 camlCClosure__knr_6933 (in coqc) + 1600 [0x1102e3a20]
143 camlCClosure__drop_parameters_6548 (in coqc) + 78 [0x1102e00de]
136 camlCClosure__knh_6861 (in coqc) + 0 [0x1102e2770]
131 camlCClosure__knht_6862 (in coqc) + 748 [0x1102e2dfc]
131 camlCClosure__knr_6933 (in coqc) + 1403 [0x1102e395b]
129 camlCClosure__knr_6933 (in coqc) + 0 [0x1102e33e0]
124 camlCClosure__knr_6933 (in coqc) + 1961 [0x1102e3b89]
118 camlReduction__eqappr_5037 (in coqc) + 0 [0x1102ea2e0]
111 camlCClosure__contract_fix_vect_6606 (in coqc) + 617 [0x1102e0619]
105 caml_make_vect (in coqc) + 153 [0x1104360b9] array.c:310
103 caml_page_table_lookup (in coqc) + 0 [0x11042f670] memory.c:95
```

This is a clear case of conversion going wrong.

It's quite hard to debug, but you can try to wrap your subproofs into opaque definitions if you're courageous enough.

I am still lacking a bit of CS jargon (physicist ...). Is conversion checking what change, cbv and/or simplification do in the proof script?

When there is some lambda-term `t`

of type `A`

which is plugged in a place where a term of type `B`

is expected, a convertibility check is performed to verify that types `A`

and `B`

are compatible. Tactics `change`

, `cbv`

, etc, replace the current goal with a compatible one, so the kernel will indeed need to check for convertibility at `Qed`

time. But that is not specific to these tactics. Basically, everything boils down to a convertibility check at `Qed`

time: `reflexivity`

, `rewrite`

, `apply`

, etc. (That is why Coq's kernel is believed to be a small piece of code, as it is supposed to implement just a convertibility checker.)

Is conversion checking what change, ...

Yes, but the kernel faces slightly different conversion problems and has slightly different heuristics than the upper layers.

We had to fight this many times in mathcomp, but it is still a black magic thing. If the files are public, I can give it a look. Otherwise calling `abstract`

here and there to see how timing changes may be a way to reduce the problem.

One observation regarding this: my terms are all about Z and a mix of concrete large numbers (up to 2^64) and variables. As known e.g. the Prop comparisons in Z compute:

```
Z.le = fun x y : Z => (x ?= y) <> Gt
```

E.g. `z<=100`

computes (with cbv) to:

```
match z with
| 100 => Eq
| 0 | 95 | 63 | 79 | 47 | 31 | 87 | 55 | 71 | 39 | 23 | 15 |
91 | 59 | 75 | 43 | 27 | 83 | 51 | 99 | 67 | 35 | 19 | 11 |
7 | 93 | 61 | 77 | 45 | 29 | 85 | 53 | 69 | 37 | 21 | 13 |
89 | 57 | 73 | 41 | 25 | 81 | 49 | 97 | 65 | 33 | 17 | 9 |
5 | 3 | 94 | 62 | 78 | 46 | 30 | 86 | 54 | 70 | 38 | 22 |
14 | 90 | 58 | 74 | 42 | 26 | 82 | 50 | 98 | 66 | 34 | 18 |
10 | 6 | 92 | 60 | 76 | 44 | 28 | 84 | 52 | 68 | 36 | 20 |
12 | 88 | 56 | 72 | 40 | 24 | 80 | 48 | 96 | 64 | 32 | 16 |
8 | 4 | 2 | 1 | Z.neg _ => Lt
| _ => Gt
end = Gt -> False
```

If one has 2^64 on the right hand side, this is game over. So essentially I cannot ever call cbv except with very careful delta specs. For anything which involves simplifying Z arithmetic, I mirror the computation e.g. with Ltac2 and change the result (the change step is a bit hard in Ltac2 ...). cbn/simple sometimes work, but tend to take a long time.

Up to now I was assuming that what QED does is not affected by this. In the end "+" also expands if the left side is a literal, although I am not sure the internal term size grows with the size of the literal. The printed representation of say `2^n+x`

grows exponentially with n, but I think it has a lot of reuse of nested terms, unlike what happens with "z<=2^n", which expands to a match over all numbers with are less than 2^n.

So might the presence of `z<= large literal`

in the terms be a problem?

Btw.: the growth with C code statement count is about `s^10`

.

The root of all evil, as shown in your trace, is handling application: when the system faces `f x y z = f x' y' z'`

then it tries to compare arguments pairwise, eg `x = x'`

..., and if one fails it unfolds `f`

and retries. It also tries to record the reduced forms, to avoid reducing the same terms again, but if the term is large, this may be counterproductive.

a mix of concrete large numbers (up to 2^64) and variables

well, this is exactly where the thing may derail

a "quick" test would be to "lock" the operator that computes. One way is to define a module and module type which expose a (body less) symbol and an equation revealing the real body. You may need to use the equation from time to time, but the kernel will not use it autonomously.

The first example which comes to mind: https://github.com/math-comp/math-comp/blob/8e859e628404c50c8191f30489abf4e799f46f8d/mathcomp/ssreflect/bigop.v#L534-L542

Indeed in mathcomp $\sum_{i<j}$ does not compute (even if it is just a fold over a list) because of this locking. One has to unlock it (we have a generic mechanism/lemmas to do that, see the canonical structure declaration just after the locking module).

The most astonishing side to it is that a really super messy equality proof between two largish Z terms (converting 32 bit modulo arithmetic to Z arithmetic) involving about 30 Coq Interval and 25 Gappa proofs type checks all in one in 2 seconds. So Master Guillaume: teach us!

The C proof doesn't really do any manipulations of the Z terms, it just sticks together terms. So the really super hard work takes 2s to type check and the easy part doesn't finish in 11 hours.

If we are right in the diagnosis, then it is really an heuristic which derails, it can become exponential for no good reason.

I think the tricky part is that Z is also used in internal logic of VST e.g. for manipulating symbol -> value mappings.

I never expected `z <= N`

to reduce to a term whose size grows as `N`

. That is mindblowing.

So, if Coq needs to check whether `z <= N`

and `z <= M`

are convertible and it unfolds `<=`

too early for some reason, then this might blow up.

Let me think what I can do. The issue is that the range proofs I do to remove all the `mod 2^32`

one gets from C arithmetic only work with the concrete numbers. I wonder if I could abstract all literal numbers in such a way that I can still apply the equality proof for the concrete numbers.

You can try playing with the `Strategy`

command. For example, if Coq is unfolding `<=`

or `<`

too early, you can reduce their priority with `Strategy 1000 [Z.le Z.lt].`

One thought: I try to make my proofs modular. I first have a proof that a C function computes some specific (possibly largish) term in Z. This term is part of the Spec of the C function and thus in the type of the proof goal. I then later proof that this term has certain properties, say approximates some real function to a certain precision.

I wonder if it would help to proof a more abstract goal right away, which would not involve any largish Z term, but instead some Real number stuff.

For example, the Gappa plugin contains `Strategy 1000 [Generic_fmt.round]`

, because Coq should only unfold `round`

if it gets desperate.

You can try playing with the Strategy command. For example, if Coq is unfolding <= or < too early, you can reduce their priority with Strategy 1000 [Z.le Z.lt].

Is this something the kernel takes into account during QED?

All the complicated Z stuff I put into a separate lemma. So in essence the C proof is just a forward recording of what the C code computes, then I apply an equality lemma to the specified Z term and QED.

Yes, it is taken into account by the kernel. When it needs to check that `f x`

and `g y`

are convertible, it will choose which of `f`

or `g`

to unfold depending on the level passed to `Strategy`

. (I do not remember how the default levels are computed, but the higher the less likely to be unfolded, if memory serves me right.)

That said, your idea of abstracting away all the `2^32`

constants is a good one.

By default it unfold on the right, there is no pre computation of priorities based on, say, logical dependencies.

Wasn't there a precomputation based on arity or something?

Yes, it is taken into account by the kernel.

Interesting - I din't expect this.

That said, your idea of abstracting away all the 2^32 constants is a good one.

So in essence I could abstract all literals and hide them behind some opaqueness ID function with A very high strategy value?

No need for `Strategy`

in that case. Just using opaque terms might be sufficient to steer the kernel away from looking at them.

Yes but beware that Strategy or Opaque are hints, the kernel may unfold anyway. The module opacity I talked about is not a hint, it is final.

No need for Strategy in that case. Just using opaque terms might be sufficient to steer the kernel away from looking at them.

I see. So I abstract all numbers as soon as they appear (say move from C statements into the variable context) - or possibly even already in the C code before I start to look at it. Then I apply the equality lemma which unfolds the definitions since there is no other way, but it wouldn't hurt then since it is a 1:1 correspondence of terms. And then I QED?

Yes but beware that Strategy or Opaque are hints, the kernel may unfold anyway. The module opacity I talked about is not a hint, it is final.

Yes, this might also work - completely abstract away all operations instead of the numbers. Then I can convert the terms with equality lemmas.

Another question: does it make a difference for the kernel if I rewrite with an equality compared to converting the goal such that it becomes the equality I have and apply it? Imagine I have `f(x) = f(y)`

and a proof for `x=y`

. I could rewrite with the equality or I could apply f_equal and then the equality.

Does this make a difference for the workload during QED?

One more piece of symptomatic information: the stack depth is typically around 500 and it does go up from time to time to level 10 or so.

Michael Soegtrop said:

Another question: does it make a difference for the kernel if I rewrite with an equality compared to converting the goal such that it becomes the equality I have and apply it? Imagine I have

`f(x) = f(y)`

and a proof for`x=y`

. I could rewrite with the equality or I could apply f_equal and then the equality.Does this make a difference for the workload during QED?

Not really.

But beware that when we say "convert" we mean by change/simpl, not by apply/rewrite.

But beware that when we say "convert" we mean by change/simpl, not by apply/rewrite.

Yes, but applying lemmas or rewriting in the end also requires a conversion to check it. I can imagine that this should not be hard for `exact`

, but I am not so sure about `rewrite`

in largish terms.

Anyway, I will try the things we discussed above (first abstracting the numbers (should be easy), then abstracting the operations) and see if this helps.

I will also think about how I can get towards a reflective proof. If I understand this right the idea behind Guillaume's tools and also the core idea of ssreflect is to apply a lemma which converts the goal into something which completely computes, e.g. to true=true, so that the kernel's job is straight forward computation, which doesn't need any heuristics which could possibly go wrong.

To be clear, there is still a heuristic and it can still go awfully wrong. But since the heuristic is "reduce arguments before unfolding functions", it can be implemented in a very efficient way (that is the main motivation) and it is predictable (that is a bonus).

@Michael Soegtrop You describe precisely what reflection is, and cbv/vm_compute/native_compute do run fast!

To be precise, *small scale* reflection is (also) used on open terms, eg prove "(S x <= S y) = true" by applying H : "(x <= y) = true". Note that in this case the LHS does not compute to true, since it contains variables. These are still proof steps taken care by computation, but typically the length of the computation is very different.

Regarding `Z.le`

, it boils down to the following testcase:

```
Time Check (fun z => fun H : Pos.compare_cont Eq z 1000000 = Gt =>
H : Pos.compare_cont Eq (xO z) 2000000 = Gt).
(* 6 seconds *)
```

I guess Coq does not know if it should start by reducing the first or second occurrence of `compare_cont`

, chooses the first instead of the second, and then explodes.

That said, this is a bit strange, because after a step of reducing the first occurrence, it should switch to the second one, since the first one now starts with a blocked match. There is no reason for Coq to keep reducing the first occurrence.

So, there might be an actual bug in Coq.

remark: the kernel compares applications right-to-left

Here the order does not matter because the arguments are always in normal form, so their comparison is immediate, unless I am missing something.

```
Require Import ZArith.
Axiom z : positive.
Goal True.
Time unify
(Pos.compare_cont Eq z 1000000)
((fix compare_cont (r : comparison) (x y : positive) {struct y} : comparison :=
match x with
| (p~1)%positive =>
match y with
| (q~1)%positive => compare_cont r p q
| (q~0)%positive => compare_cont Gt p q
| 1%positive => Gt
end
| (p~0)%positive =>
match y with
| (q~1)%positive => compare_cont Lt p q
| (q~0)%positive => compare_cont r p q
| 1%positive => Gt
end
| 1%positive => match y with
| 1%positive => r
| _ => Lt
end
end)
Eq z 1000000%positive).
```

This is what you get after the unfold you expect, so we should look what the conversion does in the case `Const | args`

vs `Fix | args`

.

I guess it does not do the pairwise comparison which would be immediate...

Enrico Tassi said:

Note that in this case the LHS does not compute to true, since it contains variables.

But isn't it so that the the term one has at the point of Qed fully computes to e.g. true=true? It is clear that intermediate terms one has in the construction of this final term are usually open.

Michael Soegtrop said:

But isn't it so that the the term one has at the point of Qed fully computes to e.g. true=true? It is clear that intermediate terms one has in the construction of this final term are usually open.

No, not for proofs by reflection. When we get to the goal `foo = true`

, `foo`

is usually already closed. Indeed, if there are still some variables at this point, it means that they do not participate in the computation.

You may get this incorrect intuition from the fact that many theorems about type theory say that *in an empty context* ... For example a term of type nat is either O or S, but only in an empty context.

I meant the process of constructing the term `foo`

and relating it to whatever goal one has. During this process one sees all kinds of terms.

Enrico Tassi said:

I guess it does not do the pairwise comparison which would be immediate...

That is interesting. Also, I am a bit surprised that the code does not unfold the definitions on both sides if they are the same symbol. Keeping one folded is just delaying the inevitable.

Now I don't have time to dig, but I guess it pushes the fix on the stack, and put the literal in head, then fire a iota, obtaining a stuck match on z, then pushes the match on the stack and leaves z in head. Then it must go left and obtain the same thing. So my memory tricks me, because something more costly must happen in between.

And we should not forget that it has to compare the unfolded fix with y substituted in, maybe it is then that something goes wrong

given that y is concrete, it can always unfold the fix, even if x (z) is stuck

what is hard to see is why the two terms don't "align" sooner than later

If Coq chooses to reduce the `fix`

instead of unfolding the symbol, they can never align until the `fix`

is fully expanded. At that point, Coq has created a huge `match`

on `x`

and it has to check for convertibility on all the branches, hence the exponential behavior. That is my current theory.

My recalling is that there are checks to (quickly) compare the stack (the context under which the comparison is happening), but they just say "can't be equal", while here one would like to see "maybe better to reduce the other side, since here we already have so many match!"

Hum, no, that can't work here... sorry for the brainstorming.

I tried modifying the conversion checker so that it unfolds on both sides, when the symbol is the same. Unfortunately, `Int63.v`

no longer goes through. The calls to `auto with zarith`

no longer terminate then. That is strange.

In this case the good heuristic seems to be to consider both x and y recargs (which would weaken the conversion relation, but not that much)

(That does not fix the issue anyway.)

Int63 is *extremely* sensitive to unfolding heuristics.

I can't recount the number of hours I have lost trying to tweak the conversion algorithm, just to face an explosion in Int63.

For good or bad reasons?

we can always rewrite that file, if it is just that file...

There is essentially a fixpoint that unfolds to a 2^32 case analysis.

I believe this can be worked around but I'd concerned about end-users facing the same kind of issues.

(and thus breaking backwards compat)

I think it would be much better if "a fixpoint that unfolds to a 2^32 case analysis" would *never* work, and force users to find a better way. What is a pain is that *sometime* is works

But I don't understand why unfolding the same head symbol on both sides could cause such a divergence.

Because 2^32 is huge?

@Enrico Tassi if you want to implement a workaround, I think you can try changing the L2R comparison order and compiling Int63.

Loop guaranteed, 100% satisfaction.

What I mean is that I modified the checker so that, when it encounters `f x = f y`

, it unfolds `f`

on both sides rather than on one side only. How could unfolding only one `f`

succeeds?

if you're in a partial fixpoint unfolding, I believe.

Imagine you have f := fix F := ..., and the LHS is the one-step unfolding of the RHS.

but unfolding the LHS leads to a huge fixpoint.

I can probably cookup an artificial example.

No, I am talking about the constant case. Both sides have to be unfolded, otherwise they can never match (right?).

@Guillaume Melquiond it eventually also unfolds the second f

I think you code is correct, but changes the order in which things happen

so it may just be slower

Pierre-Marie Pédrot said:

but unfolding the LHS leads to a huge fixpoint.

Okay, I see. So, the fact that it currently works is completely improbable, because, if Coq ever decides to unfold the LHS before the RHS has reached the exact term of the unfolded LHS, then an exponential blowup occurs.

"Le salaire de la peur."

That said, I still do not understand why Enrico's example does not get through instantly. As far as I understand, the algorithm first puts the terms into a very weak hnf, which should give something like `compare_cont ...`

vs `z | Case ...`

. At this point, Coq unfold the head of the LHS and puts it into hnf, i.e., `z | Case ...`

. The heads match each other, so Coq moves to checking the convertibility of the branches. But they should be syntactically equal at this point, so I do not understand where Coq is wasting time.

there's no syntactically equal fast path once you're reducing

Oh, right! We are no longer encountering a single symbol application, so there is nothing to prematurely stop the full hnf.

Maybe

"maybe better to reduce the other side, since here we already have so many match!"

was not so silly after all

once you have an irreducible match in your context, you may start reducing the other side until you find a match

Using the type `pos_bool = 1 + 2 * pos_bool`

instead of `positive = 1 + positive + positive`

fixes the exponential blowup, since it reduces the number of recursive branches in `compare_cont`

from 2 to 1. Unfortunately, that is not a sustainable solution, since the conversion from `positive`

to `pos_bool`

is costly, allocation-wise.

Enrico Tassi said:

once you have an irreducible match in your context, you may start reducing the other side until you find a match

That is what happens, somehow, except that you now have two branches to compare, which in turn means that you have four branches to compare, and so on.

oh, I see, you are off by one match anyway so you have to continue

I have one more question for debugging this: does `abstract`

do an immediate Qeq - that is it is technically the same as proving a lemma outside of the current lemma, Qed it and applying it in the current lemma?

Not quite, but as a first approximation it's reasonable to believe that.

There is complex machinery in the kernel to make a best effort to behave that way.

Thanks!

Pierre-Marie Pédrot said:

There is complex machinery in the kernel to make a best effort to behave that way.

Last time I looked at "abstracted" terms, I thought I was going crazy. It took me a while to understand that abstracting was no longer creating an intermediate lemma unless the proof is terminated by `Defined`

.

Indeed, but the kernel skips the type-checking of the corresponding subproof, so it behaves the same in terms of conversion problems and the like.

Indeed, but the kernel skips the type-checking of the corresponding subproof, so it behaves the same in terms of conversion problems and the like.

Now I am confused. Does it type check the sub proof right away or not?

Yes, it does.

Pierre-Marie's sentence should be understood as: The kernel will encounter the exact same conversion checks as if the subproof had been put in a separate lemma.

(Even if it is actually put in a separate lemma only if `Defined`

is used.)

So, 1. the abstracted subproof is immediately checked, 2. the wrapping proof is checked at Qed time, 3. the abstracted subproof is inserted in the wrapping proof but the resulting proof is not checked at Qed time.

Thanks Guillaume!

For debugging this, I try to define parts of a proof as axiom. For this I need to cut and paste terms, which doesn't work in the presence of printing only notations.

There was this PR merged in 2017 (https://github.com/coq/coq/pull/703) to do this. Did a similar feature remain? Close Scope does not seem to have the effect of disabling notations for printing.

Switching off all notations is not much of an option in presence of large numbers.

Coming to this discussion late, but in my experience (as confirmed here), slow Qed is almost always either (1) conversion not recorded in the proof trace (using `simpl`

/`cbn`

, `vm_compute`

/`native_compute`

in the hypotheses rather than the goal, `change`

, or more rarely using `cbv`

or `lazy`

with a blacklist/whitelist), (2) issues with too many universes, or (3) the guard checker being slow. The "quick" fix, once you've tracked down the offending `simpl`

(or whatever) (which you can do by adding `Axiom admit : forall {T}, T`

and then trying `all: exact admit. Unshelve. all: exact admit. Qed.`

at various points in the proof, and seeing when it first gets slow), is to replace the call to conversion with something like `simpl`

with `match goal with |- ?G => let G' := (eval simpl in G) in replace G with G' by reflexivity end`

(replacing `reflexivity`

with `abstract reflexivity`

if you want).

(Is it worth sticking this explanation on some wiki somewhere?)

@Jason Gross I believe so

@Pierre-Marie Pédrot Do you have a suggestion of where to put it? Is there some wiki of "Coq Debugging Tips" or "My Coq is Too Slow, What Should I Do?"?

somewhere around https://github.com/coq/coq/wiki/Performance probably?

there is https://github.com/coq/coq/wiki/Troubleshooting also which mentions slow Qed

@Jason Gross : thanks a lot!

So if I got you right any sort of conversion with change, cbv, cbn, ... in goals is OK, but in hypothesis it might be harmful? Does this also apply to the value part of local definitions or only to the type part?

E.g. the VST automation abstracts certain parts of the goal into local definitions, mostly to keep the goal readable and then does quite a bit of conversions in these local definitions, e.g. symbol -> value mapping management, but as far as I can tell mostly in the values of the definitions and not in the types. Could this possibly be harmful?

Another though: afaik usually a change is more efficient in the goal than a rewrite. Does this also apply to hypothesis or could the type checker make use of rewrite equalities to steer the process? In the end an equation would be more rigid than a conversion, even if the left and right hand sides are convertible. So might a "rewrite x with y in H by cbv" be better than a "change x with y in H"?

So if I got you right any sort of conversion with change, cbv, cbn, ... in goals is OK, but in hypothesis it might be harmful?

Not quite. `vm_compute`

and `native_compute`

in goals is okay. In hypotheses it is almost always wrong to use `vm_compute`

and `native_compute`

. `cbv`

and `lazy`

are fine (in all contexts) as long as you issue the right `Strategy`

commands to cause kernel conversion to unfold your whitelists before anything else, and to unfold your blacklists after anything else. Hope will be lost if you use contradictory whitelists/blacklists in the same proof, even if you use `abstract`

(at least if you want `coqchk`

to handle your vo file; the checker sees `Strategy`

as being per-top-level proof). Finding the right strategy commands for `simpl`

/`cbn`

is a black art I have not even begun to master, so I suggest avoiding them entirely when performance matters. All conversion tactics are fine if the term you're converting is small enough.

E.g. the VST automation abstracts certain parts of the goal into local definitions, mostly to keep the goal readable and then does quite a bit of conversions in these local definitions, e.g. symbol -> value mapping management, but as far as I can tell mostly in the values of the definitions and not in the types. Could this possibly be harmful?

Yes, it's risky, though in general the overhead comes from unfolding recursive definitions in the wrong order, so as long as you don't have deep recursion in any of the definitions here (i.e., all of your maps are small), it should be fine.

Jason Gross said:

In hypotheses it is almost always wrong to use

`vm_compute`

and`native_compute`

.`cbv`

and`lazy`

are fine (in all contexts)

I would say it differently. It is always (!) wrong to use `vm_compute`

and `native_compute`

in the context and this is noticeable. Similarly, using `cbv`

and `lazy`

in the context is just as wrong (for the same reason: no transtyping ascriptions are put in the proof term for operations on the context), irrespective of `Strategy`

. But more often than not, the issue is not noticeable.

Another though: afaik usually a change is more efficient in the goal than a rewrite. Does this also apply to hypothesis or could the type checker make use of rewrite equalities to steer the process? In the end an equation would be more rigid than a conversion, even if the left and right hand sides are convertible. So might a "rewrite x with y in H by cbv" be better than a "change x with y in H"?

It applies to hypotheses as well.

`change`

is more efficient, until it slows down your `Qed`

. At that point, you can either find out the right sequence of transformations that the kernel will compute efficiently, doing `refine (_ : new_goal_type)`

repeatedly to leave evidence in the proof term, or else you can give up and use the sledgehammer of `rewrite`

. Note, btw, that the kernel can be sensitive to which term is on the left and which is on the right, so if you're dabbling in this arcane art, the first trick to learn, IIRC, is that if your hypothesis has type `G`

and you're converting it to be a thing of type `G'`

, then `refine (((fun v : G' => (v : G)) _)`

and `refine ((fun (f : G' -> G') => (f : G -> G')) (fun x => x) _)`

can have drastically different performances because, IIRC, the first one will result in the conversion problem `G' ≡ G`

while the second one will result in the conversion problem `G ≡ G'`

. Example:

```
Fixpoint fact (n : nat) := match n with 0 => 1 | S n' => n * fact n' end.
Axiom P : nat -> Prop.
Check (fun (H : P (id fact 10))
=> (fun (f : P (fact 10) -> P (fact 10)) => (f : P (id fact 10) -> P (fact 10)))
(fun x => x)
H)
: P (id fact 10) -> P (fact 10). (* fast *)
Check (fun (H : P (id fact 10))
=> (H : P (fact 10)))
: P (id fact 10) -> P (fact 10). (* very slow *)
```

@Guillaume Melquiond

Similarly, using cbv and lazy in the context is just as wrong

You can drop `in the context`

here. `cbv`

and `lazy`

do not insert transtyping ascriptions even when used in the goal.

Really? Then that is a bug. Because that is the only reason why `REVERTcast`

exists in Coq, as far as I know.

I just tested with Coq 8.9. The type ascriptions are there, fortunately.

Hm, how can you tell? `cbv. Show Proof`

does not show any cast nodes in the proof term.

Did you try `Set Printing All`

?

Ah, no, I was not aware that `Set Printing All`

is needed, given that `DEFAULTcast`

does not need `Set Printing All`

and is printed identically to `REVERTcast`

when `Set Printing All`

is enabled.

Yes, having to disable notations just to see reverse casts is a bit painful.

It seems strange that `change`

does not leave a `REVERTcast`

Agreed.

I've written a fleshed out guide in https://github.com/coq/coq/wiki/Troubleshooting#what-can-i-do-when-qed-is-slow (comments, suggestions, and improvements welcome) and reported https://github.com/coq/coq/issues/13984 and https://github.com/coq/coq/issues/13985

@Guillaume Melquiond

But more often than not, the issue is not noticeable.

Is this so because vm_compute / native_compute usually do much more complicated conversions than cbv, so that it is harder for the kernel to type check later, or is this due to a technical difference?

The former, that is, if `vm_compute`

is not performed at `Qed`

time, you will notice it. But if `cbv`

is done differently at `Qed`

time, that is much harder to notice.

@Jason Gross : thanks for writing this! Very helpful, both the background and the concrete hints.

One note: the section (https://github.com/coq/coq/wiki/Troubleshooting#conversion-not-recorded-in-the-proof-trace) seems contradictory - I guess due to editing after the feedback from Guillaume. First you say that simpl & Co don't leave a cast and a few lines later you say they do.

@Michael Soegtrop Glad it's helpful! I've adjusted the text a bit, hopefully it's more clear now. In fact simpl & co do leave cast nodes, but, unlike vm and native, these nodes are not enough to reconstruct the exact reduction strategy used in the proof script, and hence the kernel will almost always take a different path. For lazy and cbv this is often not too bad, especially when augmented with Strategy commands, but for simpl and cbn this is really insufficient. Is this all clear from the text now?

@Jason Gross : yes, thanks! It is very clear now. Possibly the background knowledge you present should make its way into the reference manual.

fwiw I recently also had a problem of slow Qed time when defining a couple of large fixpoints (also when defining them non-interactively where I'm not sure if we can call it Qed time anymore) which turned out to be neither conversion nor the guardedness checker -- as `Time Guarded.`

just before Qed was very fast.

But it was because I had not indicated the structurally decreasing argument anywhere which apparently takes a long time to infer since adding the annotation reduced compilation time of a pathological coq file from more than an hour to 4 seconds.

Although I would have expected infering the decreasing argument is also the guardedness checker's job since from my understanding this argument is the object of interest that is checked for being guarded right?

The wiki page linked by @Jason Gross seems to be publicly editable. I will then probably add that tidbit. And does anyone know how one could trigger and time the inference manually so that it's better to debug when that is the cause of a slowdown?

That's very interesting, and yes, please do add it to the section about fixpoints! Could you extract the example into a standalone file and maybe report it as a performance issue on the Coq bug tracker? I'm also curious whether or not extruding the body, as I suggest in the section about fixpoints, changes the timing. Also, does permuting the arguments change the time of the `Guarded`

call? It seems plausible to me that `Guarded`

picks a different order for guessing the structural argument than Qed. (Also, I'm guessing that there's some argument that if you declare that one to be structural, Coq takes a long time to figure out that it's not?)

it seems Guarded does no guessing, instead it just uses the last argument

```
Fixpoint F (x y:nat) : nat.
Proof.
destruct x as [|x].
- exact 0.
- exact (F x y).
Fail Guarded.
Show Proof.
(* (fix F (x y : nat) {struct y} : nat := match x with
| 0 => 0
| S x0 => F x0 y
end) *)
Qed.
```

IIRC we wanted to forbid tactic mode Fixpoint without `struct`

, which would ensure Guarded is only used when it works

but I don't know if any specific person has decided to make it happen

@Jason Gross @Gaëtan Gilbert : can you please summarise the status of `change`

after the changes done since this has been discussed above? Is it safe (in terms of QED time) to use `change`

now on the goal, or should I still use e.g. `replace a with b by reflexivity`

instead? I guess any direct use of reductions tactics on the type of hypothesis is still a reason for QED slowness.

Is it safe (in terms of QED time) to use change now on the goal

Was it unsafe at some point?

@Gaëtan Gilbert : well this is what the Coq Wiki (section written by @Jason Gross) says:

The tactics simpl, cbn, lazy, cbv, compute, hnf, unfold, and red (

but not change, see Issue #13984) when used on the goal only will leave a special kind of cast in the proof term

@Gaëtan Gilbert : I saw your edit to the wiki . May I interpret this such that using `change`

still does not leave a cast in the proof term, so that changing a term with a computed form of it in the goal can lead to QED slowness?

https://github.com/coq/coq/issues/13984 is still open but now there's an MR from Gaëtan https://github.com/coq/coq/pull/17271

yup

After some experiments I found that the QED slowness I am experiencing is due to `cbv`

with explicit delta list. If I set `Strategy expand`

for exactly the same symbols as I have in the delta only list, the QED is fast. Note that using just `Strategy`

without an explicit delta list does not work, since I would have to set strategy opaque for all other symbols.

I wonder what might be a reasonable way to handle this. I could use `with_strategy`

around an `abstract`

as recommended here - the only issue is that I frequently use computed delta lists and as far as I can see neither Ltac2 nor Elpi do have a `with_strategy`

tactic which would take a symbol list.

I wonder if we should have an `abstract`

tactic, which takes a reduction strategy and converts this to strategy expand flags for the kernel.

isn't it redundant with `with_strategy`

?

Not knowing how this is implemented I am not sure what is efficient. Apparently I anyway need a redundancy between delta lists I give to `cbv`

and strategy commands I give to the kernel. Since I have explicit delta lists, I don't need the strategy for `cbv`

- I only need it for the kernel. So I thought it might be more efficient to give it to `abstract`

.

I also wonder if we should have a strategy option for QED.

In the end I want to have an effective way to get cbv with explicit delta list through the kernel without having to specify my symbol lists several times.

Last updated: Jun 23 2024 at 23:01 UTC