(cross-posting from https://github.com/coq/coq/issues/13807) How should a Coq bench package go about declaring a dependency on native_compute
? (Currently it's unconditionally disabled and bench packages that require native_compute
to function just fail / take forever)
See the discussion in the old bug tracker, I think that'd require significant work in the bench infra if we mean to have numbers that do matter in many cases, as you need a way to filter the time spent in the OCaml compiler when measuring .vo
file compilation
Can't we have support for ondemand native compilation? I have a bench job that stack overflows in vm_compute but not native_compute, and fiat-crypto has a couple of tests that are much, much, much faster in native_compute
(ondemand) than with vm_compute
.
Jason Gross said:
I have a bench job that stack overflows in
vm_compute
but notnative_compute
That is a bit surprising, because the VM machinery itself does not use the stack (there is no function call, just a massive loop). The compilation of VM terms uses the stack, but nowhere near what the OCaml compiler uses. So, the issue would be when decompiling terms, but that could be the case only if the executed terms were not reducible in the first place.
Reification is not tailrec, so it can (and will) overflow on big terms.
Yes, but that is the same whether you use vm_compute
or native_compute
I cannot think of a situation where vm_compute
would use more stack than native_compute
.
The reification code is not exactly the same though, it might be enough to barely overflow or not the stack?
I suppose so. nf_whd
is taking one more argument than nf_atom
, so it could make a small difference.
Indeed, native_compute
will stack overflow on just slightly larger terms than vm_compute
(vm_compute
stack overflows when my input size parameter for this particular bench is larger than about 110
, native_compute
when the input size parameter is larger than about 130
(see https://github.com/coq-community/coq-performance-tests/commit/ae8385b9471409387d0f47f01e38b866ba70bda1), and the size of both the input and the output is quadratic in the input size parameter))
Is it possible the stack overflow occurs in handling the number of binders? Here's the small test-case for native_compute
handling more than vm_compute
:
Require Import Coq.ZArith.ZArith Coq.Lists.List.
Axiom Let_InAx : forall {A P} (x : A) (f : forall a : A, P a), P x.
Axiom ZAddAx : Z -> Z -> Z.
Definition map_double_cps {T} (ls : list Z) (k : list Z -> T) :=
list_rect
(fun _ => (list Z -> T) -> _)
(fun k => k nil)
(fun x xs rec k
=> Let_InAx (ZAddAx x x)
(fun y =>
rec (fun ys => k (y :: ys))))
ls
k.
Definition make_cps {T} (n : nat) (m : nat) (v : Z) (k : list Z -> T)
:= nat_rect
_
(fun k => k (List.repeat v n))
(fun _ rec k => rec (fun ls => map_double_cps ls k))
m
k.
Notation goal_cps n m := (forall v, make_cps n m v (fun x => x) = nil) (only parsing).
Notation goal_red_cps n m := (ltac:(let v := eval cbv [make_cps map_double_cps] in (goal_cps n m) in exact v))
(only parsing).
Notation hide := (_ = _).
(* native stack overflows with 186, succeeds with 185 *)
(* vm stack overflows with 168, succeeds with 167 *)
Goal goal_cps 168 168. cbv [make_cps map_double_cps]; intros. Time vm_compute.
Interestingly, running Time Definition v := Eval native_compute in goal_red_cps x y.
has different behavior, stack overflowing for 161 in both native_compute
and vm_compute
, and succeeding fine for 160 for both reduction strategies.
Last updated: Dec 05 2023 at 11:01 UTC