3 messages were moved here from #Ltac2 > issue prioritization by Gaëtan Gilbert.

NB

```
Time Definition v10 := Eval cbv -[Nat.add] in nary_aux 10 0.
(* 0s *)
Time Definition v100 := Eval cbv -[Nat.add] in nary_aux 100 0.
(* 0.001s *)
Time Definition v200 := Eval cbv -[Nat.add] in nary_aux 200 0.
(* 0.004s *)
Time Definition v300 := Eval cbv -[Nat.add] in nary_aux 300 0.
(* 0.005s *)
Time Definition v1000 := Eval cbv -[Nat.add] in nary_aux 1000 0.
(* 0.025s *)
Time Definition v2000 := Eval cbv -[Nat.add] in nary_aux 2000 0.
(* 0.1s *)
Time Definition v10000 := Eval cbv -[Nat.add] in nary_aux 10000 0.
(* 4s *)
```

so qrebuild real times are

```
10 -> 0s
100 -> 0s
200 -> 0s
300 -> 0.001s
1000 -> 0.002s
2000 -> 0.002s
10000 -> 0.01s
```

for the others the startup time is negligible

the `exact $res`

at the end of the in_context closure argument essentially does 2 things:

- pretype $res with constr_flags and expected type the goal, which essentially does nf_evars and unifies the goal with the type of res
- call refine, which typechecks its argument

it seems the typechecking is costly, if I expose a refine_no_check which is just refine with typecheck at https://github.com/coq/coq/blob/9d772b51a43d40505a8f14abe6c4ebae7af0f966/plugins/ltac2/tac2core.ml#L1130 flipped and

```
Ltac2 nf_evars c :=
Constr.Pretype.pretype
Constr.Pretype.Flags.constr_flags
Constr.Pretype.expected_without_type_constraint
preterm:($c).
```

and replace

```
exact $res)
```

with

```
refine_no_check (fun () => nf_evars res))
```

I get

```
rebuild with immediate subst and refine_no_check + nf_evars:
300 -> 0.05s (no nf_evars: 1.7s)
1000 -> 0.5s (no nf_evars: many s)
2000 -> 2s
10000 -> many s and eats lots of ram
```

(if we pretype with expected type the goal then exact_no_check it's a bit slower than refine_no_check + nf_evars; eg 2000 -> 3.4s)

furthermore returning the result of the in_context call through evars seems pretty costly. If I expose

```
Ltac2 @external in_context_alt : ident -> constr -> (unit -> constr) -> constr := "coq-core.plugins.ltac2" "constr_in_context_alt".
(** On a focused goal [Γ ⊢ A], [in_context id c tac] evaluates [tac]
in a focused goal [Γ, id : c ⊢ unit] and returns [fun (id : c) => t]
where [t] is the term returned by the tactic.
The goal must be untouched by the tactic and is automatically solved using tt (we could leave solving it to the tactic instead, might be simpler)
*)
```

(in_context_alt ocaml implementation: https://gist.github.com/SkySkimmer/00d11fc0bd968997166a4a8966339775)

then instead of `in_context x t (fun () => let bla := stuff in exact $bla)`

we do `in_context_alt x t (fun () => let bla := stuff in bla)`

and get

```
rebuild with eval cbv head beta and in_context_alt:
10 -> 0s
100 -> 0.008s
200 -> 0.03s
300 -> 0.07s
1000 -> 1.7s
2000 -> 12s
10000 -> many s
rebuild immediate subst and in_context_alt:
200 -> 0.01s
300 -> 0.02s
1000 -> 0.2s
2000 -> 0.8s
10000 -> 20s and 4GB ram
rebuild delayed subst and in_context_alt:
10 -> 0s
100 -> 0.002s
200 -> 0.006s
300 -> 0.015s
1000 -> 0.13s
2000 -> 0.5s
10000 -> 13s (negligible ram)
```

I guess if we want to be even faster then we need n-ary in_context_alt so that we don't call subst_var n times on the growing result (quadratic behaviour), and modify the `rebuild`

logic to take advantage of that

EDIT actually it seems that at this point we spend a lot of time in freshness operations according to perf

also note that we've gotten to the point where the startup time `eval cbv -[Nat.add] in nary_aux ...`

is of similar magnitude as the `rebuild delayed subst and in_context_alt`

time

to reduce freshness operations we may expose

```
Ltac2 @external next_id : ident -> ident := "coq-core.plugins.ltac2" "next_id".
```

which is Nameops.increment_subscript, then

```
Ltac2 rec rebuild_aux nextx f args :=
let f :=
match Unsafe.kind f with
| Unsafe.Lambda _ _ => f
| _ =>
let f := Unsafe.substnl args 0 f in
f
end
in
match Unsafe.kind f with
| Unsafe.Lambda bnd bdy =>
let t := Binder.type bnd in
let xc := Unsafe.make (Unsafe.Var nextx) in
let res :=
in_context_alt nextx t
(fun () =>
let res := rebuild_aux (next_id nextx) bdy (xc::args) in
res)
in
res
| _ => '(S $f)
end.
Ltac2 rebuild f := rebuild_aux ident:(x) f [].
```

produces

```
rebuild delayed subst + in_context_alt + next_id
1000 -> 0.05s
2000 -> 0.16s
10000 -> 3.7s
```

we may go even further by having a

```
Ltac2 @external in_context_alt2 : ident -> constr -> (unit -> 'a) -> 'a := "coq-core.plugins.ltac2" "constr_in_context_alt2".
```

which does not wrap the result of the closure in a lambda and

```
Ltac2 @external subst_vars : ident list -> constr -> constr := "coq-core.plugins.ltac2" "constr_subst_vars".
```

which is EConstr.Vars.subst_vars (replaces the given vars with rels),

used as

```
(* args are all assumptions, such that [substnl args 0 f] and [substnl args 0 return_value] live in the current env *)
Ltac2 rec rebuild_aux nextx f vars args :=
let f :=
match Unsafe.kind f with
| Unsafe.Lambda _ _ => f
| _ =>
let f := Unsafe.substnl args 0 f in
f (* args are all variables so this isn't a Lambda *)
end
in
match Unsafe.kind f with
| Unsafe.Lambda bnd bdy =>
let t := Binder.type bnd in
let xc := Unsafe.make (Unsafe.Var nextx) in
let res :=
in_context_alt2 nextx t
(fun () =>
let res := rebuild_aux (next_id nextx) bdy (nextx :: vars) (xc::args) in
res)
in
Unsafe.make (Unsafe.Lambda bnd res)
| _ => subst_vars vars '(S $f) (* NB $f is correct in the current environment here! *)
end.
Ltac2 rebuild f := rebuild_aux ident:(x) f [] [].
```

with timings

```
rebuild delayed subst + in_context_alt2 + next_id
1000 -> 0.015s
2000 -> 0.03s
10000 -> 0.2s
```

still 20x slower than the full unsafe qrebuild, but allows calling reductions & company on the subterms

(also NB: `Ltac2 in_context_alt x t f := let res := in_context_alt2 x t f in make (Lambda (Binder.make x t) (subst_vars [x] res))`

)

This is looking great, thank you!

Does next_ident still do the right thing when, e.g., we have x2 in the context but not other x* variables? How does it know to skip over x2 from the context in particular?

no, next_id does not look at the env

if we use Nameops.Fresh https://github.com/coq/coq/blob/f2dcd551be045bcd314e03915325662812709582/engine/nameops.mli#L55-L68 instead of ident sets to provide Ltac2 Fresh.Free, with Ltac2 `fresh2`

being `Nameops.Fresh.fresh`

, we get

```
(* immediate subst + in_context_alt + fresh2:
1000 -> 0.08s
2000 -> 0.35s
10000 -> 11s
delayed + fresh2:
1000 -> 1.7s
2000 -> 8.2s
delayed + in_context_alt + fresh2:
1000 -> 0.05s
2000 -> 0.16s
10000 -> 3.6s
*)
```

(where eg delayed + in_context_alt + fresh2 is

```
Ltac2 rec rebuild_aux avoid f args :=
let f :=
match Unsafe.kind f with
| Unsafe.Lambda _ _ => f
| _ => Unsafe.substnl args 0 f
end
in
match Unsafe.kind f with
| Unsafe.Lambda bnd bdy =>
let x := Binder.name bnd in
let t := Binder.type bnd in
let x :=
match x with
| Some x => x
| None => ident:(x)
end
in
let (x, avoid) := Fresh.fresh2 avoid x in
let xc := Unsafe.make (Unsafe.Var x) in
let res :=
in_context_alt x t
(fun () =>
let res := rebuild_aux avoid bdy (xc::args) in
Std.exact_no_check 'tt;
res)
in
res
| _ => '(S $f)
end.
Ltac2 rebuild f := rebuild_aux (Fresh.Free.of_goal()) f [].
```

This is equivalent to delayed + in_context_alt + next_id but more correct for name freshness

however Nameops.Fresh does not expose a union operation and I don't know if one can be provided efficiently so it's not a dropin replacement for ltac2 Fresh.Free

(and delayed + in_context_alt2 + fresh2:

```
1000 -> 0.02s
2000 -> 0.03s
10000 -> 0.2s
```

)

in_context doesn't actually need to be implemented by a primitive, we can do

```
(** Called in a focused goal [Γ ⊢ A],
runs tac in a fresh focused goal [Γ ⊢ c]
with argument the term corresponding to that goal
and returns the result. *)
Ltac2 with_goal c tac :=
(* NB: open constr enforces that we are focused *)
let v := '(_ :> $c) in
let ev :=
match Constr.Unsafe.kind v with
| Constr.Unsafe.Evar ev _ => ev
| _ => Control.throw Assertion_failure
end
in
Control.new_goal ev;
Control.focus 2 2 (fun () => tac v).
Local Ltac2 to_intro_pat id :=
Std.IntroNaming (Std.IntroIdentifier id).
Ltac2 my_in_context id c (tac:unit -> unit) :=
with_goal '(forall x:$c, _) (fun v =>
Std.intros false [to_intro_pat id];
tac ();
v).
```

this is slightly slower:

```
my_in_context + immediate substnl:
1000 -> 2.4s (instead of 1.9s)
2000 -> 10s (instead of 7.5s)
```

I guess because we have an extra evar involved per step (we start in goal `forall x:t, _`

then `intros id`

instead of directly in goal `id:t |- _`

)

we can do a adhoc nary in_context:

```
Ltac2 rec rebuild_aux avoid vars bnds f :=
match Unsafe.kind f with
| Unsafe.Lambda bnd bdy =>
let x := Binder.name bnd in
let x :=
match x with
| Some x => x
| None => ident:(x)
end
in
let x := Fresh.fresh avoid x in
let avoid := Fresh.Free.union avoid (Fresh.Free.of_ids [x]) in
rebuild_aux avoid (x::vars) (bnd::bnds) bdy
| _ =>
(* would need something more complicated if the type is dependent
especially if we don't know what the output type should be,
we probably need a preliminary with_goal to generate the evar. *)
let g := List.fold_left (fun g bnd => Unsafe.make (Unsafe.Prod bnd g)) 'nat bnds in
with_goal g (fun v =>
Std.intros false (List.map to_intro_pat vars);
let f := Unsafe.substnl (List.map (fun id => Unsafe.make (Unsafe.Var id)) vars) 0 f in
(* here [f] should be valid in the focused context *)
Std.exact_no_check '(S $f);
v)
end.
Ltac2 rebuild f := rebuild_aux (Fresh.Free.of_goal()) [] [] f.
```

then we get

```
delayed with_goal + substnl:
1000 -> 0.1s
2000 -> 0.4s
10000 -> 10s
```

(similar to delayed subst + in_context_alt)

Last updated: Oct 12 2024 at 13:01 UTC