Stream: Ltac2

Topic: Constr.in_context perf issues


view this post on Zulip Notification Bot (May 28 2024 at 11:44):

3 messages were moved here from #Ltac2 > issue prioritization by Gaëtan Gilbert.

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 11:50):

NB

Time Definition v10 := Eval cbv -[Nat.add] in nary_aux 10 0.
(* 0s *)
Time Definition v100 := Eval cbv -[Nat.add] in nary_aux 100 0.
(* 0.001s *)
Time Definition v200 := Eval cbv -[Nat.add] in nary_aux 200 0.
(* 0.004s *)
Time Definition v300 := Eval cbv -[Nat.add] in nary_aux 300 0.
(* 0.005s *)
Time Definition v1000 := Eval cbv -[Nat.add] in nary_aux 1000 0.
(* 0.025s *)
Time Definition v2000 := Eval cbv -[Nat.add] in nary_aux 2000 0.
(* 0.1s *)
Time Definition v10000 := Eval cbv -[Nat.add] in nary_aux 10000 0.
(* 4s *)

so qrebuild real times are

    10    -> 0s
    100   -> 0s
    200   -> 0s
    300   -> 0.001s
    1000  -> 0.002s
    2000  -> 0.002s
    10000 -> 0.01s

for the others the startup time is negligible

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 12:40):

the exact $res at the end of the in_context closure argument essentially does 2 things:

it seems the typechecking is costly, if I expose a refine_no_check which is just refine with typecheck at https://github.com/coq/coq/blob/9d772b51a43d40505a8f14abe6c4ebae7af0f966/plugins/ltac2/tac2core.ml#L1130 flipped and

Ltac2 nf_evars c :=
  Constr.Pretype.pretype
    Constr.Pretype.Flags.constr_flags
    Constr.Pretype.expected_without_type_constraint
    preterm:($c).

and replace

             exact $res)

with

             refine_no_check (fun () => nf_evars res))

I get

    rebuild with immediate subst and refine_no_check + nf_evars:
    300   -> 0.05s (no nf_evars: 1.7s)
    1000  -> 0.5s  (no nf_evars: many s)
    2000  -> 2s
    10000 -> many s and eats lots of ram

(if we pretype with expected type the goal then exact_no_check it's a bit slower than refine_no_check + nf_evars; eg 2000 -> 3.4s)

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 12:49):

furthermore returning the result of the in_context call through evars seems pretty costly. If I expose

Ltac2 @external in_context_alt : ident -> constr -> (unit -> constr) -> constr := "coq-core.plugins.ltac2" "constr_in_context_alt".
(** On a focused goal [Γ ⊢ A], [in_context id c tac] evaluates [tac]
   in a focused goal [Γ, id : c ⊢ unit] and returns [fun (id : c) => t]
   where [t] is the term returned by the tactic.
   The goal must be untouched by the tactic and is automatically solved using tt (we could leave solving it to the tactic instead, might be simpler)
*)

(in_context_alt ocaml implementation: https://gist.github.com/SkySkimmer/00d11fc0bd968997166a4a8966339775)

then instead of in_context x t (fun () => let bla := stuff in exact $bla) we do in_context_alt x t (fun () => let bla := stuff in bla) and get

    rebuild with eval cbv head beta and in_context_alt:
    10    -> 0s
    100   -> 0.008s
    200   -> 0.03s
    300   -> 0.07s
    1000  -> 1.7s
    2000  -> 12s
    10000 -> many s

    rebuild immediate subst and in_context_alt:
    200   -> 0.01s
    300   -> 0.02s
    1000  -> 0.2s
    2000  -> 0.8s
    10000 -> 20s and 4GB ram

    rebuild delayed subst and in_context_alt:
    10    -> 0s
    100   -> 0.002s
    200   -> 0.006s
    300   -> 0.015s
    1000  -> 0.13s
    2000  -> 0.5s
    10000 -> 13s (negligible ram)

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 13:01):

I guess if we want to be even faster then we need n-ary in_context_alt so that we don't call subst_var n times on the growing result (quadratic behaviour), and modify the rebuild logic to take advantage of that
EDIT actually it seems that at this point we spend a lot of time in freshness operations according to perf

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 13:04):

also note that we've gotten to the point where the startup time eval cbv -[Nat.add] in nary_aux ... is of similar magnitude as the rebuild delayed subst and in_context_alt time

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 14:07):

to reduce freshness operations we may expose

Ltac2 @external next_id : ident -> ident := "coq-core.plugins.ltac2" "next_id".

which is Nameops.increment_subscript, then

Ltac2 rec rebuild_aux nextx f args :=
  let f :=
    match Unsafe.kind f with
    | Unsafe.Lambda _ _ => f
    | _ =>
        let f := Unsafe.substnl args 0 f in
        f
    end
  in
  match Unsafe.kind f with
  | Unsafe.Lambda bnd bdy =>
      let t := Binder.type bnd in
      let xc := Unsafe.make (Unsafe.Var nextx) in
      let res :=
        in_context_alt nextx t
          (fun () =>
             let res := rebuild_aux (next_id nextx) bdy (xc::args) in
             res)
      in
      res
  | _ => '(S $f)
  end.

Ltac2 rebuild f := rebuild_aux ident:(x) f [].

produces

    rebuild delayed subst + in_context_alt + next_id
    1000  -> 0.05s
    2000  -> 0.16s
    10000 -> 3.7s

we may go even further by having a

Ltac2 @external in_context_alt2 : ident -> constr -> (unit -> 'a) -> 'a := "coq-core.plugins.ltac2" "constr_in_context_alt2".

which does not wrap the result of the closure in a lambda and

Ltac2 @external subst_vars : ident list -> constr -> constr := "coq-core.plugins.ltac2" "constr_subst_vars".

which is EConstr.Vars.subst_vars (replaces the given vars with rels),
used as

(* args are all assumptions, such that [substnl args 0 f] and [substnl args 0 return_value] live in the current env *)
Ltac2 rec rebuild_aux nextx f vars args :=
  let f :=
    match Unsafe.kind f with
    | Unsafe.Lambda _ _ => f
    | _ =>
        let f := Unsafe.substnl args 0 f in
        f (* args are all variables so this isn't a Lambda *)
    end
  in
  match Unsafe.kind f with
  | Unsafe.Lambda bnd bdy =>
      let t := Binder.type bnd in
      let xc := Unsafe.make (Unsafe.Var nextx) in
      let res :=
        in_context_alt2 nextx t
          (fun () =>
             let res := rebuild_aux (next_id nextx) bdy (nextx :: vars) (xc::args) in
             res)
      in
      Unsafe.make (Unsafe.Lambda bnd res)
  | _ => subst_vars vars '(S $f) (* NB $f is correct in the current environment here! *)
  end.

Ltac2 rebuild f := rebuild_aux ident:(x) f [] [].

with timings

    rebuild delayed subst + in_context_alt2 + next_id
    1000  -> 0.015s
    2000  -> 0.03s
    10000 -> 0.2s

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 14:22):

still 20x slower than the full unsafe qrebuild, but allows calling reductions & company on the subterms

(also NB: Ltac2 in_context_alt x t f := let res := in_context_alt2 x t f in make (Lambda (Binder.make x t) (subst_vars [x] res)))

view this post on Zulip Jason Gross (May 28 2024 at 15:41):

This is looking great, thank you!

view this post on Zulip Jason Gross (May 28 2024 at 15:42):

Does next_ident still do the right thing when, e.g., we have x2 in the context but not other x* variables? How does it know to skip over x2 from the context in particular?

view this post on Zulip Gaëtan Gilbert (May 28 2024 at 15:48):

no, next_id does not look at the env

view this post on Zulip Gaëtan Gilbert (May 29 2024 at 12:58):

if we use Nameops.Fresh https://github.com/coq/coq/blob/f2dcd551be045bcd314e03915325662812709582/engine/nameops.mli#L55-L68 instead of ident sets to provide Ltac2 Fresh.Free, with Ltac2 fresh2 being Nameops.Fresh.fresh, we get

  (* immediate subst + in_context_alt + fresh2:
     1000  -> 0.08s
     2000  -> 0.35s
     10000 -> 11s

     delayed + fresh2:
     1000 -> 1.7s
     2000 -> 8.2s

     delayed + in_context_alt + fresh2:
     1000  -> 0.05s
     2000  -> 0.16s
     10000 -> 3.6s
   *)

(where eg delayed + in_context_alt + fresh2 is

Ltac2 rec rebuild_aux avoid f args :=
  let f :=
    match Unsafe.kind f with
    | Unsafe.Lambda _ _ => f
    | _ => Unsafe.substnl args 0 f
    end
  in
  match Unsafe.kind f with
  | Unsafe.Lambda bnd bdy =>
      let x := Binder.name bnd in
      let t := Binder.type bnd in
      let x :=
        match x with
        | Some x => x
        | None => ident:(x)
        end
      in
      let (x, avoid) := Fresh.fresh2 avoid x in
      let xc := Unsafe.make (Unsafe.Var x) in
      let res :=
        in_context_alt x t
          (fun () =>
             let res := rebuild_aux avoid bdy (xc::args) in
             Std.exact_no_check 'tt;
             res)
      in
      res
  | _ => '(S $f)
  end.

Ltac2 rebuild f := rebuild_aux (Fresh.Free.of_goal()) f [].

This is equivalent to delayed + in_context_alt + next_id but more correct for name freshness

however Nameops.Fresh does not expose a union operation and I don't know if one can be provided efficiently so it's not a dropin replacement for ltac2 Fresh.Free

view this post on Zulip Gaëtan Gilbert (May 29 2024 at 13:02):

(and delayed + in_context_alt2 + fresh2:

     1000  -> 0.02s
     2000  -> 0.03s
     10000 -> 0.2s

)

view this post on Zulip Gaëtan Gilbert (May 30 2024 at 14:25):

in_context doesn't actually need to be implemented by a primitive, we can do

(** Called in a focused goal [Γ ⊢ A],
    runs tac in a fresh focused goal [Γ ⊢ c]
    with argument the term corresponding to that goal
    and returns the result. *)
Ltac2 with_goal c tac :=
  (* NB: open constr enforces that we are focused *)
  let v := '(_ :> $c) in
  let ev :=
    match Constr.Unsafe.kind v with
    | Constr.Unsafe.Evar ev _ => ev
    | _ => Control.throw Assertion_failure
    end
  in
  Control.new_goal ev;
  Control.focus 2 2 (fun () => tac v).

Local Ltac2 to_intro_pat id :=
  Std.IntroNaming (Std.IntroIdentifier id).

Ltac2 my_in_context id c (tac:unit -> unit) :=
  with_goal '(forall x:$c, _) (fun v =>
    Std.intros false [to_intro_pat id];
    tac ();
    v).

this is slightly slower:

    my_in_context + immediate substnl:
    1000 -> 2.4s (instead of 1.9s)
    2000 -> 10s (instead of 7.5s)

I guess because we have an extra evar involved per step (we start in goal forall x:t, _ then intros id instead of directly in goal id:t |- _)

view this post on Zulip Gaëtan Gilbert (May 30 2024 at 14:30):

we can do a adhoc nary in_context:

Ltac2 rec rebuild_aux avoid vars bnds f :=
  match Unsafe.kind f with
  | Unsafe.Lambda bnd bdy =>
      let x := Binder.name bnd in
      let x :=
        match x with
        | Some x => x
        | None => ident:(x)
        end
      in
      let x := Fresh.fresh avoid x in
      let avoid := Fresh.Free.union avoid (Fresh.Free.of_ids [x]) in
      rebuild_aux avoid (x::vars) (bnd::bnds) bdy
  | _ =>
      (* would need something more complicated if the type is dependent
         especially if we don't know what the output type should be,
         we probably need a preliminary with_goal to generate the evar. *)
      let g := List.fold_left (fun g bnd => Unsafe.make (Unsafe.Prod bnd g)) 'nat bnds in
      with_goal g (fun v =>
        Std.intros false (List.map to_intro_pat vars);
        let f := Unsafe.substnl (List.map (fun id => Unsafe.make (Unsafe.Var id)) vars) 0 f in
        (* here [f] should be valid in the focused context *)
        Std.exact_no_check '(S $f);
        v)
  end.

Ltac2 rebuild f := rebuild_aux (Fresh.Free.of_goal()) [] [] f.

then we get

    delayed with_goal + substnl:
    1000  -> 0.1s
    2000  -> 0.4s
    10000 -> 10s

(similar to delayed subst + in_context_alt)


Last updated: Oct 12 2024 at 13:01 UTC