Stream: Coq users

Topic: Why would the kernel type checker do this ?


view this post on Zulip Michael Soegtrop (Apr 03 2021 at 10:23):

I have some issues with VST where Qed time is in some cases exponential with the number of locals. It easily gets very long with sensible use cases. I looked into what the kernel does during Qed and the exponential essentially boils down to the following function, which removes a previous assignment value for a local from the list of local assignments:

Definition rlt_ident_eq := Pos.eqb.  (* for convenience in selectively simplifying *)

Fixpoint remove_localdef_temp (i: ident) (l: list localdef) : list localdef :=
  match l with
  | nil => nil
  | d :: l0 =>
     let rest := remove_localdef_temp i l0 in
     match d with
     | temp j v =>
       if rlt_ident_eq i j
       then rest
       else d :: rest
     | _ => d :: rest
     end
  end.

I already modified this function a bit - the original took a sumbool in the if rather than a bool and I also introduced the let rest.

Now what takes long during Qed time looks like this:

(ocd) frame 48
#48 Reduction kernel/reduction.ml:369:73
369   try eqappr cv_pb l2r infos (lft1, (term1,[])) (lft2, (term2,[])) cuniv<|a|>
(ocd) print term2
term2: CClosure.fconstr =
  (if rlt_ident_eq _x21 _x08
   then
    if rlt_ident_eq _x21 _x07
    then
     if rlt_ident_eq _x21 _x06
     then
      if rlt_ident_eq _x21 _x05
      then
       if rlt_ident_eq _x21 _x04
       then
        if rlt_ident_eq _x21 _x03
        then
         if rlt_ident_eq _x21 _x02
         then
          if rlt_ident_eq _x21 _x01
          then
           if rlt_ident_eq _x21 _x00
           then
            if rlt_ident_eq _x21 _val
            then CSTR.list._0._1 localdef
            else
             CSTR.list._0._2 localdef
               (CSTR.localdef._0._1 _val (CSTR.val._0._2 _UNBOUND_REL_75))
               (CSTR.list._0._1 localdef)
           else

The if is fully expanded to the number of locals and since the recursive term occurs twice in the function after zeta expansion (the 3rd occurrence is trivially removed), this means a 2^n exponential growth of term size and checking time.

The proof term size is modest (it growth 3rd power with the number of locals - not good but bearable). Non of the expansions occur in the proof term (also not in casts with Set Printing All). The function rlt_ident_eqdoes not appear in the proof term.

What I wonder is why the kernel does not expand rlt_ident_eq, which is defined to be Pos.eqb, so that it could decide which branch is active?

In the tactic I do the reduction of remove_localdef_tempwith:

Ltac simplify_remove_localdef_temp_3 :=
  match goal with |- context [remove_localdef_temp ?i ?L]  =>
    let u := constr:(remove_localdef_temp i L) in
    (* unfold remove_localdef_temp and do function and if/match conversion,
      but do not expand the let in remove_localdef_temp, which would lead to exponential blow up *)
    let u' := eval lazy delta [remove_localdef_temp] beta iota in u in
    (* now fully simplify all terms with rlt_ident_eq as head symbol *)
    let u' := eval simpl rlt_ident_eq in u' in
    (* do another beta iota conversion to collapse all ifs *)
    let u' := eval lazy beta iota in u' in
    (* now all the correct branches have been selected and we can safely expand the lets *)
    let u' := eval lazy zeta in u' in
    (* leave an explicit type cast in the proof term before and after the change *)
    cbv delta [dummy];
    change u with u';
    cbv delta [dummy]
  end.

I have a VST based simple and easy to scale example I can share in case someone want's to take a hands on look. Since the sample can be scaled one can look at the full proof term for small N or easily break the debugger in the long Qed with a big N.

view this post on Zulip Michael Soegtrop (Apr 03 2021 at 10:33):

P.S.: A Strategy expand [rlt_ident_eq Pos.eqb]. doesn't help.

view this post on Zulip Jason Gross (Apr 03 2021 at 11:52):

I think Coq is really eager to inline let and to beta reduce, and you basically shouldn't expect either of them to control evaluation order except in vm_compte and native_compute. What happens if you make a definition Let_In and use that instead of let and set it's strategy to be positive?

view this post on Zulip Gaëtan Gilbert (Apr 03 2021 at 12:01):

is the breakpoint before entering eqappr or after returning from it? (fconstr is mutable so it matters)
what's term1 like?

view this post on Zulip Guillaume Melquiond (Apr 03 2021 at 12:53):

Using Strategy on rlt_ident_eq is pointless here, I think, because Coq never gets into the situation where it needs to decide which of two symbols it needs to unfold. (Symbol remove_localdef_temp was unfolded at the very start, so rlt_ident_eq can never take over.) So, Jason's idea of hiding some parts of the computation behind dummy symbols seems good. For example, replacing remove_localdef_temp by almost_opaque_identity remove_localdef_temp in the body of the function might also help.

view this post on Zulip Michael Soegtrop (Apr 03 2021 at 14:53):

Gaëtan Gilbert said:

is the breakpoint before entering eqappr or after returning from it? (fconstr is mutable so it matters)
what's term1 like?

In the middle of it. I posted the value of term1 and term2 at the start and end of the same code location for a smaller sub-term here:
(https://gist.github.com/MSoegtropIMC/4f96106054fc6aac106879a4a5ea807e)
As you can see term1 and term2 before and after the call are all identical.

view this post on Zulip Michael Soegtrop (Apr 03 2021 at 15:04):

@Jason Gross @Guillaume Melquiond : thanks for the hints and explanation - I will try putting some half opaque definitions in between to control reduction.

view this post on Zulip Michael Soegtrop (Apr 03 2021 at 15:04):

Another thought: I wonder if I shouldn't change the tactic such that it creates a fine grain equality proof with one lemma application per list entry for the reduced term and the original term. I could prove lemmas for the various match cases and return a nested equality proof using such lemmas. Would such a nested equality proof be easier to follow for the kernel?

And one more thought: reflexivity between the reduced term and the original term is instantaneous. I wonder why the kernel takes forever when reflexivity can prove it in an instant.

view this post on Zulip Michael Soegtrop (Apr 03 2021 at 17:26):

I finally found the issue by looking at the proof terms of the N=2 example. There is another tactic in VST which does equality proofs of the reduction of remove_localdef_temp which reads:

Ltac prove_delete_temp := match goal with |- ?A = _ =>
  let Q := fresh "Q" in set (Q:=A); hnf in Q; subst Q; reflexivity
end.

The hnf here doesn't do harm because it just unfolds one step of the recursion, but it seems to inspire the kernel to do a full reduction during proof checking. If I replace this with just reflexivity it is slightly slower for simple cases but much faster (looks about linear at first glance) for complex cases. For a list length of N=28 the Qed time went from 450s down to 5.6s. And my real case is more like N=50 ...

Thanks for the support, especially @Jason Gross for writing the nice "slow Qed" tutorial!


Last updated: Feb 01 2023 at 13:03 UTC