Stream: Coq devs & plugin devs

Topic: Avoiding running the vm / native compiler twice


view this post on Zulip Jason Gross (Jul 18 2023 at 21:35):

Given a value x, is there a way to get the result of Eval native_compute in x together with a proof that this result is equal to x, without needing to run the native compiler twice? (I couldn't find a way, so made a feature request on the issue tracker)

view this post on Zulip Paolo Giarrusso (Jul 19 2023 at 10:45):

What about Definition resPair : { y | x = y } := Eval native_compute in ltac:(now exists x). Definition res := proj1_sig resPair.?

view this post on Zulip Paolo Giarrusso (Jul 19 2023 at 10:50):

I haven't checked if the program produced by native_compute computes x once or twice or you need to insert a let, but I hope all the duplicates of x are added during readback: the program should just returns (normalized x, a unit)

view this post on Zulip Guillaume Melquiond (Jul 19 2023 at 11:28):

Coq still needs to check that the proof term produced by Eval native_compute in ltac:(now exists x) has indeed the type { y | x = y }, which presumably is incredibly slow (as you would not use native_compute otherwise).

view this post on Zulip Guillaume Melquiond (Jul 19 2023 at 11:34):

Hmm... I am really confused. Coq does not behave the way I think it should:

Definition foo := Eval vm_compute in (eq_refl (1 + 1)).
Print foo. (* foo = @eq_refl nat (1 + 1) : 1 + 1 = 1 + 1 *)

(Using compute instead of vm_compute gives a much saner result.)

view this post on Zulip Jason Gross (Jul 20 2023 at 19:46):

@Guillaume Melquiond vm_compute and native_compute do not reduce the parameters of inductive constructors. This presumably buys performance when constructing values which have no closures except in parameters of inductive constructors.

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 06:22):

Indeed, constructors are parameter-free during the first reduction pass. But that does not explain why the parameters are not being reduced when they are later reconstructed from the type of the expression.

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 06:22):

Anyway, I had a hope that this "feature" would make @Paolo Giarrusso 's solution succeed, since the parameters of the exists are thus ignored. But I just checked, it does not.

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 06:23):

Even with the following variant that is twice as fast, it still incurs a full, painful, type check:

Definition resPair := Eval native_compute in exist (fun y => x = y) x (eq_refl x).

view this post on Zulip Pierre-Marie Pédrot (Jul 21 2023 at 11:20):

See https://github.com/coq/coq/pull/17882/files for a PR that normalizes parameters of inductive types in the VM.

view this post on Zulip Pierre-Marie Pédrot (Jul 21 2023 at 11:21):

There are still a few suspicious places were we don't normalize some terms, e.g. case nodes but I'm not sure it's critical.

view this post on Zulip Jason Gross (Jul 21 2023 at 13:46):

As I mentioned in the issue, we can avoid the typecheck by running the VM/native compiler twice:

Set Implicit Arguments.
Inductive value_eq A : A -> Type := the_value (x : A) : value_eq x.
Definition extract {A x} (val : value_eq x) : A := match val with the_value x => x end.
Lemma extract_eq {A x} (val : @value_eq A x) : extract val = x.
Proof. destruct val; reflexivity. Qed.
Notation native_compute x := (match x return value_eq x with y => ltac:(native_compute; constructor) end)
                               (only parsing).
Notation vm_compute x := (match x return value_eq x with y => ltac:(vm_compute; constructor) end)
                           (only parsing).


From Coq Require Import NArith.
Fixpoint fib (n : nat) : N := match n with O => 1 | S O => 1 | S (S n'' as n') => fib n' + fib n'' end%N.
Time Eval cbv in fib 30. (* 2.9s *)
Time Eval vm_compute in fib 30. (* 0.125 s *)
Time Definition bar := vm_compute (fib 30). (* 0.27 s *)
Time Definition bar_val := Eval hnf in extract bar. (* 0s *)
Time Definition bar_eq : bar_val = fib 30 := extract_eq bar. (* 0s *)

This is still pretty bad if the reduction takes 10 minutes, though in the case of the native compiler I think we are saved by the cache.

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 14:06):

It is a bit cumbersome, but you can always trick the system by using cofixpoints:

Require Import Arith.
Definition x := let y := fact 11 in y - y + (y - y).

CoInductive lazy {T} := Lazy (x : T).
CoFixpoint lazy_x := Lazy x.
Definition pair_x : { y | x = y }.
Proof.
  exists (match lazy_x with Lazy x => x end).
  cbv iota delta [lazy_x].
  reflexivity.
Defined.
Time Eval vm_compute in proj1_sig pair_x. (* 3.297s *)
Lemma spec_x : x = 0.
Proof. vm_cast_no_check (proj2_sig pair_x). Time Qed. (* 0s *)

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 14:12):

And just to be clear, Time Eval vm_compute in x takes more than 3 seconds too. So, the reduced value has indeed been computed exactly once, despite multiple type checks taking place.

view this post on Zulip Jason Gross (Jul 23 2023 at 18:27):

How do you get 0 without running the VM? Or are you saying that vm_cast_no_check and Qed both take 0s? (Which seems to be the implication, but also seems impossible, if neither Qed nor the vm_cast spend time reducing x?) Or are you saying that coinductives somehow lead to lasting side effects from running the VM? (I haven't tried any of this code out yet)

view this post on Zulip Jason Gross (Jul 23 2023 at 18:47):

Is the vm caching cofixpoints? Or doing some side effect fanciness here? The second time I run Eval vm_compute, it's instantaneous.

view this post on Zulip Jason Gross (Jul 23 2023 at 18:48):

On the other hand, this trick does not work with native_compute, giving "Error: Dynlink error: execution of module initializers in the shared library failed: Stack overflow", and vm_compute is too slow for my purposes

view this post on Zulip Gaëtan Gilbert (Jul 23 2023 at 19:18):

native_compute produces the stack overflow with just Eval native_compute in fact 11. (on my machine)

view this post on Zulip Gaëtan Gilbert (Jul 23 2023 at 19:19):

so if it works in your real world example, wrapping it in cofix may also work

view this post on Zulip Guillaume Melquiond (Jul 24 2023 at 04:13):

Jason Gross said:

Is the vm caching cofixpoints?

Yes, that is the whole point. The normal form of a cofixpoint is in the general case an infinitely sized value, so it cannot be computed eagerly by the VM (or native). So, Coq uses a lazy scheme to normalize it (think Haskell). You can use this trick to make sure a value is evaluated only once. The only caveat is that the cofixpoint value used as a cache needs to have a global identity/name, so the trick cannot be performed as part of a tactic inside a proof. It needs to be initiated at toplevel. (And just to be clear, there is no difference between VM and native, as both use a lazy scheme, though it is sometimes a bit more tricky to guarantee global identity with native.)

view this post on Zulip Gaëtan Gilbert (Jul 24 2023 at 08:38):

with native every global get wrapped in lazy unless it's a syntactic function AFAICT so the cofix adds nothing

view this post on Zulip Guillaume Melquiond (Jul 24 2023 at 08:52):

Good point. But then the same is true for vm_compute too. And indeed, it works fine without CoFixpoint, as long as a global name is used.

view this post on Zulip Guillaume Melquiond (Jul 24 2023 at 08:55):

Require Import Arith.
Definition x := let y := fact 11 in y - y + (y - y).

Definition pair_x : { y | x = y } := exist _ x eq_refl.
Time Eval vm_compute in proj1_sig pair_x. (* 2.879s *)
Lemma spec_x : x = 0.
Proof. vm_cast_no_check (proj2_sig pair_x). Time Qed. (* 0s *)

view this post on Zulip Jason Gross (Jul 24 2023 at 19:48):

Why is the compilation phase of native_compute not cached?

view this post on Zulip Gaëtan Gilbert (Jul 24 2023 at 20:13):

isn't it?

view this post on Zulip Jason Gross (Jul 24 2023 at 21:54):

It does not seem to be:

From Coq Require Import Arith.

Set NativeCompute Timing.
Definition x := Eval cbn beta iota delta [Nat.add Nat.mul] in
    let y := fact 10 in 40 * (y - y).
Time Eval native_compute in x.
(* native_compute: Conversion to native code done in 0.00018
native_compute: Compilation done in 0.43090
native_compute: Evaluation done in 0.52650
native_compute: Reification done in 0.00005

     = 0
     : nat
Finished transaction in 0.958 secs (0.53u,0.s) (successful)
*)
Time Eval native_compute in x.
(* native_compute: Conversion to native code done in 0.00001
native_compute: Compilation done in 0.36689
native_compute: Evaluation done in 0.00104
native_compute: Reification done in 0.00004

     = 0
     : nat
Finished transaction in 0.368 secs (0.003u,0.s) (successful)
 *)

view this post on Zulip Guillaume Melquiond (Jul 25 2023 at 05:02):

What is the point of caching it? Why add a whole cache system just to deal with a user who calls Eval native_compute in foo twice in a row? If you had evaluated a different term bar the second time, the cache would have been useless. (Note that vm_compute does not have a cache either. If you do Eval vm_compute in foo twice in a row, foo will be compiled to bytecode twice.)

view this post on Zulip Michael Soegtrop (Jul 25 2023 at 07:59):

I sometimes thought about compiling a function once and reusing it with different arguments (ground terms), so that caching would make sense, but I guess one can't easily make this distinction in Coq - in the end one calls compute on a term.

view this post on Zulip Gaëtan Gilbert (Jul 25 2023 at 08:03):

compilation of toplevel definitions is cached
when you do Eval native_compute in x twice, the first file looks like

(* compile constant x *)
let const_x = lazy (some stuff)

(* compile and run the term, which here is just x *)
let t1 = Lazy.force const_x

(* communicate back to the caller *)
let _ = rt1 := t1

but the second is just

let t1 = Lazy.force Otherfile.const_x

let _ = rt1 := t1

view this post on Zulip Guillaume Melquiond (Jul 25 2023 at 08:18):

I would not even call it a cache. Once a compiled symbol has been loaded into Coq's address space by the dynamic linker, it is there to stay. Why I am saying it is not a cache is that, if you backtrack before the compilation of a symbol, then it will be compiled again from scratch the next time it is encountered (and linked into memory with a different name, which means that memory use keeps growing without any hope of being freed).

view this post on Zulip Jason Gross (Jul 27 2023 at 21:50):

So is 0.37 seconds the constant startup cost of running ocamlopt or something?

view this post on Zulip Gaëtan Gilbert (Jul 27 2023 at 21:53):

I guess so
possibly including the cost of loading the precompiled files for Arith and dependencies

view this post on Zulip Jason Gross (Jul 27 2023 at 22:51):

Could this be avoided by linking some version of ocamlopt with Coq, rather than calling it as an external tool, if this is not already done? So that we don't have to reload the precompiled files for ever invocation of the native compiler?

view this post on Zulip Jason Gross (Jul 27 2023 at 23:06):

Eugh, I now have a case where, after doing Definition foo := Eval native_compute in bar., doing Time Definition baz := Eval native_compute in foo. spends 10--15 minutes in ocamlopt compilation. I don't suppose there's a way to improve the situation here? (e.g., could Definition foo := Eval native_compute in ... be special cased to have the native code for foo directly alias the already-linked computation, rather than emitting fresh code for it?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 28 2023 at 15:33):

Unfortunately it is not possible yet to link the OCaml compiler into Coq @Jason Gross

view this post on Zulip Emilio Jesús Gallego Arias (Jul 28 2023 at 15:33):

Plans to make that possible are welcome upstream in OCaml, but there require significant work.

view this post on Zulip Jason Gross (Jul 28 2023 at 16:00):

What are the roadblocks @Emilio Jesús Gallego Arias ?


Last updated: Jun 17 2024 at 22:01 UTC