Given a value `x`

, is there a way to get the result of `Eval native_compute in x`

together with a proof that this result is equal to `x`

, without needing to run the native compiler twice? (I couldn't find a way, so made a feature request on the issue tracker)

What about `Definition resPair : { y | x = y } := Eval native_compute in ltac:(now exists x). Definition res := proj1_sig resPair.`

?

I haven't checked if the program produced by native_compute computes x once or twice or you need to insert a let, but I hope all the duplicates of x are added during readback: the program should just returns (normalized x, a unit)

Coq still needs to check that the proof term produced by `Eval native_compute in ltac:(now exists x)`

has indeed the type `{ y | x = y }`

, which presumably is incredibly slow (as you would not use `native_compute`

otherwise).

Hmm... I am really confused. Coq does not behave the way I think it should:

```
Definition foo := Eval vm_compute in (eq_refl (1 + 1)).
Print foo. (* foo = @eq_refl nat (1 + 1) : 1 + 1 = 1 + 1 *)
```

(Using `compute`

instead of `vm_compute`

gives a much saner result.)

@Guillaume Melquiond `vm_compute`

and `native_compute`

do not reduce the parameters of inductive constructors. This presumably buys performance when constructing values which have no closures except in parameters of inductive constructors.

Indeed, constructors are parameter-free during the first reduction pass. But that does not explain why the parameters are not being reduced when they are later reconstructed from the type of the expression.

Anyway, I had a hope that this "feature" would make @Paolo Giarrusso 's solution succeed, since the parameters of the `exists`

are thus ignored. But I just checked, it does not.

Even with the following variant that is twice as fast, it still incurs a full, painful, type check:

```
Definition resPair := Eval native_compute in exist (fun y => x = y) x (eq_refl x).
```

See https://github.com/coq/coq/pull/17882/files for a PR that normalizes parameters of inductive types in the VM.

There are still a few suspicious places were we don't normalize some terms, e.g. case nodes but I'm not sure it's critical.

As I mentioned in the issue, we can avoid the typecheck by running the VM/native compiler twice:

```
Set Implicit Arguments.
Inductive value_eq A : A -> Type := the_value (x : A) : value_eq x.
Definition extract {A x} (val : value_eq x) : A := match val with the_value x => x end.
Lemma extract_eq {A x} (val : @value_eq A x) : extract val = x.
Proof. destruct val; reflexivity. Qed.
Notation native_compute x := (match x return value_eq x with y => ltac:(native_compute; constructor) end)
(only parsing).
Notation vm_compute x := (match x return value_eq x with y => ltac:(vm_compute; constructor) end)
(only parsing).
From Coq Require Import NArith.
Fixpoint fib (n : nat) : N := match n with O => 1 | S O => 1 | S (S n'' as n') => fib n' + fib n'' end%N.
Time Eval cbv in fib 30. (* 2.9s *)
Time Eval vm_compute in fib 30. (* 0.125 s *)
Time Definition bar := vm_compute (fib 30). (* 0.27 s *)
Time Definition bar_val := Eval hnf in extract bar. (* 0s *)
Time Definition bar_eq : bar_val = fib 30 := extract_eq bar. (* 0s *)
```

This is still pretty bad if the reduction takes 10 minutes, though in the case of the native compiler I think we are saved by the cache.

It is a bit cumbersome, but you can always trick the system by using cofixpoints:

```
Require Import Arith.
Definition x := let y := fact 11 in y - y + (y - y).
CoInductive lazy {T} := Lazy (x : T).
CoFixpoint lazy_x := Lazy x.
Definition pair_x : { y | x = y }.
Proof.
exists (match lazy_x with Lazy x => x end).
cbv iota delta [lazy_x].
reflexivity.
Defined.
Time Eval vm_compute in proj1_sig pair_x. (* 3.297s *)
Lemma spec_x : x = 0.
Proof. vm_cast_no_check (proj2_sig pair_x). Time Qed. (* 0s *)
```

And just to be clear, `Time Eval vm_compute in x`

takes more than 3 seconds too. So, the reduced value has indeed been computed exactly once, despite multiple type checks taking place.

How do you get `0`

without running the VM? Or are you saying that `vm_cast_no_check`

and `Qed`

both take 0s? (Which seems to be the implication, but also seems impossible, if neither Qed nor the vm_cast spend time reducing x?) Or are you saying that coinductives somehow lead to lasting side effects from running the VM? (I haven't tried any of this code out yet)

Is the vm caching cofixpoints? Or doing some side effect fanciness here? The second time I run `Eval vm_compute`

, it's instantaneous.

On the other hand, this trick does not work with `native_compute`

, giving "Error: Dynlink error: execution of module initializers in the shared library failed: Stack overflow", and `vm_compute`

is too slow for my purposes

native_compute produces the stack overflow with just `Eval native_compute in fact 11.`

(on my machine)

so if it works in your real world example, wrapping it in cofix may also work

Jason Gross said:

Is the vm caching cofixpoints?

Yes, that is the whole point. The normal form of a cofixpoint is in the general case an infinitely sized value, so it cannot be computed eagerly by the VM (or native). So, Coq uses a lazy scheme to normalize it (think Haskell). You can use this trick to make sure a value is evaluated only once. The only caveat is that the cofixpoint value used as a cache needs to have a global identity/name, so the trick cannot be performed as part of a tactic inside a proof. It needs to be initiated at toplevel. (And just to be clear, there is no difference between VM and native, as both use a lazy scheme, though it is sometimes a bit more tricky to guarantee global identity with native.)

with native every global get wrapped in lazy unless it's a syntactic function AFAICT so the cofix adds nothing

Good point. But then the same is true for `vm_compute`

too. And indeed, it works fine without `CoFixpoint`

, as long as a global name is used.

```
Require Import Arith.
Definition x := let y := fact 11 in y - y + (y - y).
Definition pair_x : { y | x = y } := exist _ x eq_refl.
Time Eval vm_compute in proj1_sig pair_x. (* 2.879s *)
Lemma spec_x : x = 0.
Proof. vm_cast_no_check (proj2_sig pair_x). Time Qed. (* 0s *)
```

Why is the compilation phase of `native_compute`

not cached?

isn't it?

It does not seem to be:

```
From Coq Require Import Arith.
Set NativeCompute Timing.
Definition x := Eval cbn beta iota delta [Nat.add Nat.mul] in
let y := fact 10 in 40 * (y - y).
Time Eval native_compute in x.
(* native_compute: Conversion to native code done in 0.00018
native_compute: Compilation done in 0.43090
native_compute: Evaluation done in 0.52650
native_compute: Reification done in 0.00005
= 0
: nat
Finished transaction in 0.958 secs (0.53u,0.s) (successful)
*)
Time Eval native_compute in x.
(* native_compute: Conversion to native code done in 0.00001
native_compute: Compilation done in 0.36689
native_compute: Evaluation done in 0.00104
native_compute: Reification done in 0.00004
= 0
: nat
Finished transaction in 0.368 secs (0.003u,0.s) (successful)
*)
```

What is the point of caching it? Why add a whole cache system just to deal with a user who calls `Eval native_compute in foo`

twice in a row? If you had evaluated a different term `bar`

the second time, the cache would have been useless. (Note that `vm_compute`

does not have a cache either. If you do `Eval vm_compute in foo`

twice in a row, `foo`

will be compiled to bytecode twice.)

I sometimes thought about compiling a function once and reusing it with different arguments (ground terms), so that caching would make sense, but I guess one can't easily make this distinction in Coq - in the end one calls compute on a term.

compilation of toplevel definitions is cached

when you do `Eval native_compute in x`

twice, the first file looks like

```
(* compile constant x *)
let const_x = lazy (some stuff)
(* compile and run the term, which here is just x *)
let t1 = Lazy.force const_x
(* communicate back to the caller *)
let _ = rt1 := t1
```

but the second is just

```
let t1 = Lazy.force Otherfile.const_x
let _ = rt1 := t1
```

I would not even call it a cache. Once a compiled symbol has been loaded into Coq's address space by the dynamic linker, it is there to stay. Why I am saying it is not a cache is that, if you backtrack before the compilation of a symbol, then it will be compiled again from scratch the next time it is encountered (and linked into memory with a different name, which means that memory use keeps growing without any hope of being freed).

So is 0.37 seconds the constant startup cost of running ocamlopt or something?

I guess so

possibly including the cost of loading the precompiled files for Arith and dependencies

Could this be avoided by linking some version of ocamlopt with Coq, rather than calling it as an external tool, if this is not already done? So that we don't have to reload the precompiled files for ever invocation of the native compiler?

Eugh, I now have a case where, after doing `Definition foo := Eval native_compute in bar.`

, doing `Time Definition baz := Eval native_compute in foo.`

spends 10--15 minutes in ocamlopt compilation. I don't suppose there's a way to improve the situation here? (e.g., could `Definition foo := Eval native_compute in ...`

be special cased to have the native code for `foo`

directly alias the already-linked computation, rather than emitting fresh code for it?

Unfortunately it is not possible yet to link the OCaml compiler into Coq @Jason Gross

Plans to make that possible are welcome upstream in OCaml, but there require significant work.

What are the roadblocks @Emilio Jesús Gallego Arias ?

Last updated: Nov 29 2023 at 22:01 UTC