(cross-posting from https://github.com/coq/coq/issues/13807) How should a Coq bench package go about declaring a dependency on `native_compute`

? (Currently it's unconditionally disabled and bench packages that require `native_compute`

to function just fail / take forever)

See the discussion in the old bug tracker, I think that'd require significant work in the bench infra if we mean to have numbers that do matter in many cases, as you need a way to filter the time spent in the OCaml compiler when measuring `.vo`

file compilation

Can't we have support for ondemand native compilation? I have a bench job that stack overflows in vm_compute but not native_compute, and fiat-crypto has a couple of tests that are much, much, much faster in `native_compute`

(ondemand) than with `vm_compute`

.

Jason Gross said:

I have a bench job that stack overflows in

`vm_compute`

but not`native_compute`

That is a bit surprising, because the VM machinery itself does not use the stack (there is no function call, just a massive loop). The compilation of VM terms uses the stack, but nowhere near what the OCaml compiler uses. So, the issue would be when decompiling terms, but that could be the case only if the executed terms were not reducible in the first place.

Reification is not tailrec, so it can (and will) overflow on big terms.

Yes, but that is the same whether you use `vm_compute`

or `native_compute`

I cannot think of a situation where `vm_compute`

would use more stack than `native_compute`

.

The reification code is not exactly the same though, it might be enough to barely overflow or not the stack?

I suppose so. `nf_whd`

is taking one more argument than `nf_atom`

, so it could make a small difference.

Indeed, `native_compute`

will stack overflow on just slightly larger terms than `vm_compute`

(`vm_compute`

stack overflows when my input size parameter for this particular bench is larger than about `110`

, `native_compute`

when the input size parameter is larger than about `130`

(see https://github.com/coq-community/coq-performance-tests/commit/ae8385b9471409387d0f47f01e38b866ba70bda1), and the size of both the input and the output is quadratic in the input size parameter))

Is it possible the stack overflow occurs in handling the number of binders? Here's the small test-case for `native_compute`

handling more than `vm_compute`

:

```
Require Import Coq.ZArith.ZArith Coq.Lists.List.
Axiom Let_InAx : forall {A P} (x : A) (f : forall a : A, P a), P x.
Axiom ZAddAx : Z -> Z -> Z.
Definition map_double_cps {T} (ls : list Z) (k : list Z -> T) :=
list_rect
(fun _ => (list Z -> T) -> _)
(fun k => k nil)
(fun x xs rec k
=> Let_InAx (ZAddAx x x)
(fun y =>
rec (fun ys => k (y :: ys))))
ls
k.
Definition make_cps {T} (n : nat) (m : nat) (v : Z) (k : list Z -> T)
:= nat_rect
_
(fun k => k (List.repeat v n))
(fun _ rec k => rec (fun ls => map_double_cps ls k))
m
k.
Notation goal_cps n m := (forall v, make_cps n m v (fun x => x) = nil) (only parsing).
Notation goal_red_cps n m := (ltac:(let v := eval cbv [make_cps map_double_cps] in (goal_cps n m) in exact v))
(only parsing).
Notation hide := (_ = _).
(* native stack overflows with 186, succeeds with 185 *)
(* vm stack overflows with 168, succeeds with 167 *)
Goal goal_cps 168 168. cbv [make_cps map_double_cps]; intros. Time vm_compute.
```

Interestingly, running `Time Definition v := Eval native_compute in goal_red_cps x y.`

has different behavior, stack overflowing for 161 in both `native_compute`

and `vm_compute`

, and succeeding fine for 160 for both reduction strategies.

Last updated: Jul 13 2024 at 03:01 UTC