I have some issues with VST where Qed time is in some cases exponential with the number of locals. It easily gets very long with sensible use cases. I looked into what the kernel does during Qed and the exponential essentially boils down to the following function, which removes a previous assignment value for a local from the list of local assignments:

```
Definition rlt_ident_eq := Pos.eqb. (* for convenience in selectively simplifying *)
Fixpoint remove_localdef_temp (i: ident) (l: list localdef) : list localdef :=
match l with
| nil => nil
| d :: l0 =>
let rest := remove_localdef_temp i l0 in
match d with
| temp j v =>
if rlt_ident_eq i j
then rest
else d :: rest
| _ => d :: rest
end
end.
```

I already modified this function a bit - the original took a `sumbool`

in the `if`

rather than a `bool`

and I also introduced the `let rest`

.

Now what takes long during Qed time looks like this:

```
(ocd) frame 48
#48 Reduction kernel/reduction.ml:369:73
369 try eqappr cv_pb l2r infos (lft1, (term1,[])) (lft2, (term2,[])) cuniv<|a|>
(ocd) print term2
term2: CClosure.fconstr =
(if rlt_ident_eq _x21 _x08
then
if rlt_ident_eq _x21 _x07
then
if rlt_ident_eq _x21 _x06
then
if rlt_ident_eq _x21 _x05
then
if rlt_ident_eq _x21 _x04
then
if rlt_ident_eq _x21 _x03
then
if rlt_ident_eq _x21 _x02
then
if rlt_ident_eq _x21 _x01
then
if rlt_ident_eq _x21 _x00
then
if rlt_ident_eq _x21 _val
then CSTR.list._0._1 localdef
else
CSTR.list._0._2 localdef
(CSTR.localdef._0._1 _val (CSTR.val._0._2 _UNBOUND_REL_75))
(CSTR.list._0._1 localdef)
else
```

The if is fully expanded to the number of locals and since the recursive term occurs twice in the function after zeta expansion (the 3rd occurrence is trivially removed), this means a 2^n exponential growth of term size and checking time.

The proof term size is modest (it growth 3rd power with the number of locals - not good but bearable). Non of the expansions occur in the proof term (also not in casts with `Set Printing All`

). The function `rlt_ident_eq`

does not appear in the proof term.

What I wonder is why the kernel does not expand `rlt_ident_eq`

, which is defined to be `Pos.eqb`

, so that it could decide which branch is active?

In the tactic I do the reduction of `remove_localdef_temp`

with:

```
Ltac simplify_remove_localdef_temp_3 :=
match goal with |- context [remove_localdef_temp ?i ?L] =>
let u := constr:(remove_localdef_temp i L) in
(* unfold remove_localdef_temp and do function and if/match conversion,
but do not expand the let in remove_localdef_temp, which would lead to exponential blow up *)
let u' := eval lazy delta [remove_localdef_temp] beta iota in u in
(* now fully simplify all terms with rlt_ident_eq as head symbol *)
let u' := eval simpl rlt_ident_eq in u' in
(* do another beta iota conversion to collapse all ifs *)
let u' := eval lazy beta iota in u' in
(* now all the correct branches have been selected and we can safely expand the lets *)
let u' := eval lazy zeta in u' in
(* leave an explicit type cast in the proof term before and after the change *)
cbv delta [dummy];
change u with u';
cbv delta [dummy]
end.
```

I have a VST based simple and easy to scale example I can share in case someone want's to take a hands on look. Since the sample can be scaled one can look at the full proof term for small N or easily break the debugger in the long Qed with a big N.

P.S.: A `Strategy expand [rlt_ident_eq Pos.eqb].`

doesn't help.

I think Coq is really eager to inline `let`

and to beta reduce, and you basically shouldn't expect either of them to control evaluation order except in vm_compte and native_compute. What happens if you make a definition Let_In and use that instead of let and set it's strategy to be positive?

is the breakpoint before entering eqappr or after returning from it? (fconstr is mutable so it matters)

what's term1 like?

Using `Strategy`

on `rlt_ident_eq`

is pointless here, I think, because Coq never gets into the situation where it needs to decide which of two symbols it needs to unfold. (Symbol `remove_localdef_temp`

was unfolded at the very start, so `rlt_ident_eq`

can never take over.) So, Jason's idea of hiding some parts of the computation behind dummy symbols seems good. For example, replacing `remove_localdef_temp`

by `almost_opaque_identity remove_localdef_temp`

in the body of the function might also help.

Gaëtan Gilbert said:

is the breakpoint before entering eqappr or after returning from it? (fconstr is mutable so it matters)

what's term1 like?

In the middle of it. I posted the value of term1 and term2 at the start and end of the same code location for a smaller sub-term here:

(https://gist.github.com/MSoegtropIMC/4f96106054fc6aac106879a4a5ea807e)

As you can see term1 and term2 before and after the call are all identical.

@Jason Gross @Guillaume Melquiond : thanks for the hints and explanation - I will try putting some half opaque definitions in between to control reduction.

Another thought: I wonder if I shouldn't change the tactic such that it creates a fine grain equality proof with one lemma application per list entry for the reduced term and the original term. I could prove lemmas for the various match cases and return a nested equality proof using such lemmas. Would such a nested equality proof be easier to follow for the kernel?

And one more thought: reflexivity between the reduced term and the original term is instantaneous. I wonder why the kernel takes forever when reflexivity can prove it in an instant.

I finally found the issue by looking at the proof terms of the N=2 example. There is another tactic in VST which does equality proofs of the reduction of `remove_localdef_temp`

which reads:

```
Ltac prove_delete_temp := match goal with |- ?A = _ =>
let Q := fresh "Q" in set (Q:=A); hnf in Q; subst Q; reflexivity
end.
```

The `hnf`

here doesn't do harm because it just unfolds one step of the recursion, but it **seems to inspire the kernel to do a full reduction during proof checking**. If I replace this with just `reflexivity`

it is slightly slower for simple cases but much faster (looks about linear at first glance) for complex cases. For a list length of `N=28`

the Qed time went from **450s down to 5.6s**. And my real case is more like N=50 ...

Thanks for the support, especially @Jason Gross for writing the nice "slow Qed" tutorial!

Last updated: Jun 16 2024 at 02:41 UTC