Hello, I am dreaming of a way to dump and reload massive amounts of data from a Coq session. It feels to me that the persistent array data would be the array of choice, at least something that Parray Uint63. But I would like to have a pair of toplevel commands with approximately the following documentation:
Eval <red_expr> to file "file name" in data.
data has to be of type Parray.array int.
Load_from_file "file name" Variable_name.
This declares a new value (like the Definition command), whose contents is obtained by a low-level read of the file at "file_name". The type of this value should be option (Parray.array int). The option is meant to handle many kinds of errors: if not all elements of the files can be cast to int (because of the missing bit, if the file size is too big to fit in the array limitations, etc.). Alternatively, we could have a bespoke Inductive type for file reading results, with clean explanations for each class of errors. Or the command could simply fail. Does this already exist in the Coq universe?
I think it is better to have such a bundling of reading and defining the value, instead of having the reading operation being added to the language of CIC expressions, which would be a consistency hole. I still foresee some problems with the way people would want to handle the "document model". The meaning of this expression makes a lot of sense if one has a sequential point of view on document execution (as in traditional proof-general, VSCoq, Coqide), but the incremental verification advocated in coq-lsp might need some adaptation to understand that even though the command does not change, one may want to re-execute it at a given time.
Isn't this a bit specific? I would prefer a more general approach. I am currently using Elpi File IO and it is reasonably efficient, but then I don't read a gigabyte of data. How much data do you want to read?
Q1: why do you want to use Coq reduction machinery to perform the first item? I mean, why not OCaml (the data is int array...). I don't see how you want to prove stuff about this data anyway.
Q2: I guess you want a fast type-checking path for the second operation as well. Or not?
More in general, I don't know exactly what problem you are facing. It is that the data is too large to be parsed by Coq? To be typed by Coq?
For example, suppose I would like to read and write bitmap in Coq, how could this be done? (PArray of int seems a reasonable way to store them in Coq no?)
Having to fully load the data and keep it forever in memory doesn't seem like a great API. Maybe exposing it in the command by passing it a byte stream consumer or something similar is going to be more efficient memory-wise.
I agree to be able to peek and poke in a bitmap would be even better but simply loading and writing in memory a small bitmap via array of int is it easy?
Easy, yes. Scalable, probably not.
Out of curiosity how fast it would be I wrote an Elpi command which outputs a list of lists of Zs. With 1000x1000 Zs (in the range 1...2000) it takes 15s (on an ARM Macbook) - so it is not that unreasonable.
From elpi Require Import elpi.
Require Import ZArith.
Require Import List.
Import ListNotations.
Elpi Command test_file_write.
Elpi Accumulate lp:{{
pred pos-to-int i:term o:int.
pos-to-int {{xH}} 1 :- !.
pos-to-int {{xO lp:{{T}} }} J :- pos-to-int T I, calc (2 * I) J, !.
pos-to-int {{xI lp:{{T}} }} J :- pos-to-int T I, calc (2 * I + 1) J, !.
pred z-to-int i:term o:int.
z-to-int {{Z0}} 0 :- !.
z-to-int {{Zpos lp:{{T}} }} I :- pos-to-int T I, !.
z-to-int {{Zneg lp:{{T}} }} J :- pos-to-int T I, calc (0 - I) J, !.
pred print-z i:out_stream i:term.
print-z OS Z:- z-to-int Z EI, output OS {calc (int_to_string EI)}.
pred print-z-list i:out_stream i:term.
print-z-list _ {{@nil _}} :- !.
print-z-list OS {{@cons _ lp:{{H}} lp:{{T}} }} :- print-z OS H, output OS " ", print-z-list OS T, !.
pred print-z-list-list i:out_stream i:term.
print-z-list-list _ {{@nil _}} :- !.
print-z-list-list OS {{@cons _ lp:{{H}} lp:{{T}} }} :- print-z-list OS H, output OS "\n", print-z-list-list OS T, !.
main [str FILENAME, trm TERM] :-
open_out FILENAME OSTREAM,
print-z-list-list OSTREAM TERM,
close_out OSTREAM.
}}.
Elpi Typecheck.
Fixpoint list_pos (n : positive) (s : positive): list positive := (
match n with
| xH => [s]
| xO n' => (list_pos n' s) ++ (list_pos n' (s+n'))
| xI n' => (list_pos n' s) ++ (list_pos n' (s+n')) ++ [s+n'+n']
end)%positive.
Definition list_list_pos (n m s : positive) := map (fun s => list_pos m s) (list_pos n s).
Time Elpi test_file_write "/Users/msoegtrop/Coq/Elpi/mytests/image1txt" ([[1; 2; 3; 4]; [5; 6; 7; 8]]%Z).
Time Eval vm_compute in (fst (0, map (fun l => map Zpos l) (list_list_pos 1000%positive 1000%positive 1%positive))).
Time Elpi test_file_write "/Users/msoegtrop/Coq/Elpi/mytests/image2.txt" (ltac:(let l:=eval vm_compute in (map (fun l => map Zpos l) (list_list_pos 1000%positive 1000%positive 1%positive)) in exact l)).
@Enrico Tassi : do you see opportunity for substantial performance optimisations?
It would be interesting to do the same in Ltac2 - need to dig out my file IO PR.
Probably it would be faster to convert the Zs to Coq ints in the Coq VM, but I don't know how to get from Coq ints to Elpi ints. Would probably be a trivial extension and likely speed up things 5x or so.
@Michael Soegtrop what about reading is it as fast?
I don't see a good reason why it shouldn't be in the same ball park. Maybe some little things is missing for efficient parsing of large text, but usually Enrico adds such things fast on request. Let me have quick look ...
As mentioned what would likely make things much faster would be a direct conversion between Coq native ints and Elpi ints. Above the Z to int conversion should eat up the majority of the time.
Maybe because of typechecking
Well the Eval compute above takes 0.3s or so.
Let's try it ...
@Laurent Théry : here is a reader. It is a tactic rather than a command, so that one can use it to instantiate a definition.
From elpi Require Import elpi.
Require Import ZArith.
Require Import List.
Import ListNotations.
Elpi Tactic test_file_read.
Elpi Accumulate lp:{{
pred int-to-pos i:int o:term.
int-to-pos 1 {{xH}} :- !.
int-to-pos I {{xO lp:{{T}} }} :- calc (I mod 2) 0, int-to-pos {calc (I div 2)} T, !.
int-to-pos I {{xI lp:{{T}} }} :- calc (I mod 2) 1, int-to-pos {calc (I div 2)} T, !.
pred int-to-z i:int o:term.
int-to-z 0 {{Z0}} :- !.
int-to-z I {{Zpos lp:{{T}} }} :- I > 0, int-to-pos I T, !.
int-to-z I {{Zneg lp:{{T}} }} :- I < 0, int-to-pos {calc (0 - I)} T, !.
pred fold-int-str-Z-list i:string i:term i:term.
fold-int-str-Z-list S LI LO :- calc (string_to_int S) EI, int-to-z EI CZ, LO = {{ lp:{{CZ}} :: lp:{{LI}} }}, !.
pred read-line-as-Z-list i:in_stream o:term.
read-line-as-Z-list IS T :- input_line IS S, rex.split " +" S SLR, std.rev SLR SL, std.fold SL {{ @nil Z }} fold-int-str-Z-list T, !.
pred read-lines-as-Z-list-list i:in_stream o:term.
read-lines-as-Z-list-list IS T :- eof IS, T = {{ @nil (list Z) }}, !.
read-lines-as-Z-list-list IS T :- read-line-as-Z-list IS LH, read-lines-as-Z-list-list IS LT, T = {{ lp:{{LH}} :: lp:{{LT}} }}, !.
solve (goal _ _ _ _ [str FILENAME] as Goal) GoalReduced :-
open_in FILENAME ISTREAM,
read-lines-as-Z-list-list ISTREAM ZL,
close_in ISTREAM,
refine ZL Goal GoalReduced
.
}}.
Elpi Typecheck.
Time Definition Test : list (list Z) := ltac:(elpi test_file_read "/Users/msoegtrop/Coq/Elpi/mytests/image2.txt").
Print Test.
As you expected it is slower, but also not out of the question. It takes about 30s for 400x400.
Maybe refine.no_check
can save some time, since Definition is going to type the term again (and your term has no holes I guess).
@Michael Soegtrop so reading is 12 time slower than writing
Well no, you did not put the list Z
implicit argument to ::
, but that is easy to fix.
@Enrico Tassi : using refine-no_check
seems to be much slower and I can't follow your second comment. Please feel free to post an improved version (you can generate a suitable input file with the writer above).
typechecking a list literal is O(n) stack space (with n the length of the list), but typechecking an array literal should be O(1)
IDK at what length you would start seeing stack overflows
It is a list of lists, so stack is not that much of an issue. With 1D lists it starts to fail between 10k and 100k.
I guess the main slowness is checking the list again and again when constructing it. That is the LO = {{ lp:{{CZ}} :: lp:{{LI}} }}
.
I'll play the speedup game later ;-)
I had a short discussion with Yves, there are many things at once.
First
Definition x := Eval <s> in term.
is a bit dumb, since it never trusts <s>
. Some of these reduction functions are part of the trusted code base, it is silly to re-typecheck the term, since we already trust they preserve the type of the input expression (which we type anyway).
Moreover, the normal form of term
does not necessarily need to be stored in Coq's logical environment, nor computed within Coq. Of course it would be nice, but it is not needed. The normal form is used to implement an untrusted lookup, since the client is going to check that the resulting value is OK. So it is a bit of an accident performance problems are entangled with type checking, and the I/O is only there to circumvent them.
In some sense, the need is very close to the cache file for lia, making it available to gallina is a harder problem than making it available to some plugin (be it written in OCaml, Elpi... etc).
Generating it via gallina, or via OCaml, or C.. is yet another independent axe.
@Laurent Théry : yes, currently about a factor of 12.5 - let's see what magic Enrico can do. IMHO the performance is still well within my expectations for a straight forward program.
So, IMO, the sensible thing to do is to have decent serialization (and I/O) APIs in our beloved extension languages.
Well not that much is missing. If we say can read and write 1000x1000 arrays in a few seconds, this would be quite sufficient IMHO.
@Laurent Théry : what is your speed target.
I like the flexibility of Elpi. Say one could write an Elpi program to output an XPM bitmap, and if one splits it properly between VM and Elpi, it should be reasonably fast.
@Enrico Tassi : following up on Gatetan's comment: can I create a Coq native array or a native int with Elpi directly?
IIRC I support primitive integers/floats, but not arrays yet
Yes, here they are: https://github.com/LPCIC/coq-elpi#supported-features-of-gallina-core-calculus-of-coq
This is my go at it, but it is not much faster.
From elpi Require Import elpi.
Require Import ZArith.
Require Import List.
Import ListNotations.
Elpi Command test_file_write.
Elpi Accumulate lp:{{
pred pos-to-int i:term o:int.
pos-to-int {{xH}} 1 :- !.
pos-to-int {{xO lp:{{T}} }} J :- pos-to-int T I, calc (2 * I) J, !.
pos-to-int {{xI lp:{{T}} }} J :- pos-to-int T I, calc (2 * I + 1) J, !.
pred z-to-int i:term o:int.
z-to-int {{Z0}} 0 :- !.
z-to-int {{Zpos lp:{{T}} }} I :- pos-to-int T I, !.
z-to-int {{Zneg lp:{{T}} }} J :- pos-to-int T I, calc (0 - I) J, !.
pred print-z i:out_stream i:term.
print-z OS Z:- z-to-int Z EI, output OS {calc (int_to_string EI)}.
pred print-z-list i:out_stream i:term.
print-z-list _ {{@nil _}} :- !.
print-z-list OS {{@cons _ lp:{{H}} lp:{{T}} }} :- print-z OS H, output OS " ", print-z-list OS T, !.
pred print-z-list-list i:out_stream i:term.
print-z-list-list _ {{@nil _}} :- !.
print-z-list-list OS {{@cons _ lp:{{H}} lp:{{T}} }} :- print-z-list OS H, output OS "\n", print-z-list-list OS T, !.
main [str FILENAME, trm TERM] :-
open_out FILENAME OSTREAM,
std.time (print-z-list-list OSTREAM TERM) T,
coq.say "write" FILENAME T,
close_out OSTREAM.
}}.
Elpi Typecheck.
Fixpoint list_pos (n : positive) (s : positive): list positive := (
match n with
| xH => [s]
| xO n' => (list_pos n' s) ++ (list_pos n' (s+n'))
| xI n' => (list_pos n' s) ++ (list_pos n' (s+n')) ++ [s+n'+n']
end)%positive.
Definition list_list_pos (n m s : positive) := map (fun s => list_pos m s) (list_pos n s).
Time Elpi test_file_write "image1.txt" ([[1; 2; 3; 4]; [5; 6; 7; 8]]%Z).
Time Eval vm_compute in (fst (0, map (fun l => map Zpos l) (list_list_pos 1000%positive 1000%positive 1%positive))).
(*Time Elpi test_file_write "image2.txt" (ltac:(let l:=eval vm_compute in (map (fun l => map Zpos l) (list_list_pos 1000%positive 1000%positive 1%positive)) in exact l)).*)
Time Elpi test_file_write "image2.txt" (ltac:(let l:=eval vm_compute in (map (fun l => map Zpos l) (list_list_pos 400%positive 400%positive 1%positive)) in exact l)).
Elpi Tactic test_file_read.
Elpi Accumulate lp:{{
pred int-to-pos i:int o:term.
int-to-pos 1 {{xH}} :- !.
int-to-pos I R :- calc (I mod 2) 0, !, R = {{xO lp:{{T}} }}, int-to-pos {calc (I div 2)} T.
int-to-pos I R :- calc (I mod 2) 1, !, R = {{xI lp:{{T}} }}, int-to-pos {calc (I div 2)} T.
pred int-to-z i:int o:term.
int-to-z 0 {{Z0}} :- !.
int-to-z I R :- I > 0, !, R = {{Zpos lp:{{T}} }}, int-to-pos I T.
int-to-z I R :- I < 0, !, R = {{Zneg lp:{{T}} }}, int-to-pos {calc (0 - I)} T.
pred fold-int-str-Z-list i:string i:term i:term.
fold-int-str-Z-list S LI LO :- calc (string_to_int S) EI, int-to-z EI CZ, LO = {{ @cons Z lp:{{CZ}} lp:{{LI}} }}.
pred read-line-as-Z-list i:in_stream o:term.
read-line-as-Z-list IS T :- input_line IS S, rex.split " +" S SLR, std.rev SLR SL, std.fold SL {{ @nil Z }} fold-int-str-Z-list T.
pred read-lines-as-Z-list-list i:in_stream o:term.
read-lines-as-Z-list-list IS T :- eof IS, !, T = {{ @nil (list Z) }}.
read-lines-as-Z-list-list IS T :- read-line-as-Z-list IS LH, read-lines-as-Z-list-list IS LT, T = {{ @cons (list Z) lp:{{LH}} lp:{{LT}} }}.
solve (goal _ _ _ _ [str FILENAME] as Goal) GoalReduced :-
open_in FILENAME ISTREAM,
std.time (read-lines-as-Z-list-list ISTREAM ZL) T1,
coq.say "read" FILENAME T1,
close_in ISTREAM,
std.time (refine.no_check ZL Goal GoalReduced) T2,
coq.say "refine" T2
.
}}.
Elpi Typecheck.
Time Definition Test1 : list (list Z) := ltac:(elpi test_file_read "image1.txt").
Time Definition Test2 : list (list Z) := ltac:(elpi test_file_read "image2.txt").
Here you can see that most of the time is spent building a Coq term from its serialization. This is the kind of serialization API one can add easily to Elpi and implement them efficiently. I mean Elpi even has an API to dump/reload a term (any Elpi term), but I did not expose it in Coq-Elpi since Coq terms are too complex for the procedure to work reliably. But one can imagine an API tailored to literals and arrays/list of literals... That would be fast.
Michael Soegtrop said:
I like the flexibility of Elpi. Say one could write an Elpi program to output an XPM bitmap, and if one splits it properly between VM and Elpi, it should be reasonably fast.
Funnily enough, @Bas Spitters once wrote such a plugin in OCaml, using constructive reals to vm_compute the points of some shape. Today we could have done it without touching the OCaml layer, not using the notation system to format the text ;-)
@Enrico Tassi : so is there a function to convert between Coq native ints and Elpi ints and back?
Yes and no, since I forgot half of it.
Look for coq.uint and coq.float.
Ah, yes, coq.uint63
. I always forget to look into both builtin files, the one for elpi and the one for coq-elpi. I am looking forward to see the other direction.
@Yves Bertot I think that as long as we reflect the effect of such commands in the command type coq-lsp would be fine. That's anyways a good idea for many reasons.
The 0.2.x branch of coq-lsp already tracks this for Require
, it knows that if the file referenced by the require has changed, then require needs to be updated.
@Enrico Tassi the plugin is still available here:
https://github.com/coq-community/corn/tree/master/dump
looking forward to dump_elpi
:wink:
@Emilio Jesús Gallego Arias, I confirm that I am interested in a use case corresponding to Require
, which is about reloading a computation that already happened, more than about loading data that would need to be discarded. The latter would need to be handled using a stream, as @Pierre-Marie Pédrot suggested. But I would be happy to start with just the first use-case.
@Yves Bertot , indeed the use case of Require
is what you find in most build systems; there is a lot of research on this kind of incremental computation and the tradeoffs which are many TTBOMK.
So in this case, I assume the ultimate "truth" about your data lives in the filesystem, right? So for example inotify of checking the hash would suffice to understand if the file has changed and the command needs re-run, correct?
About dependencies, we already have https://coq.inria.fr/refman/proof-engine/vernacular-commands.html?highlight=external#extra-dependencies which solve the problems a build system may have if a .v file depends on something else.
Yes, hashing on the file contents would probably suffice to ascertain the need for re-run.
Yves Bertot said:
Yes, hashing on the file contents would probably suffice to ascertain the need for re-run.
Then indeed coq-lsp "file dependency" effect will handle that just right, as it is similar to loading .vo files
The most common strategy in this cases is just setup a watch on the file, tho if coq-lsp (or other build system) knows how to update the file that can be optimized.
An important problem is however making such schemes robust (that is to say easily maintenable and bug free), as coq-lsp must know such commands and synthetize the right type for the incremental DAG, however that is quite fragile as the Coq API doesn't reflect when a command reads a file; IMO it would be a nice improvement on Coq's side if we could generalize the vernac interpretation to an abstract notion of file system.
I made progress with this question, loosing a little on efficiency, but taking advantage of existing functionality. In my experiment, I have a FMapAVL table with more than 1 million entries. I can then exploit the generated table to produce small outputs (a typical output of the post-processing phase is a list of length less than 20.
Definition mytable := <some code>.
Definition postprocess := f <some input> mytable.
Working with this big piece of generated data is almost comfortable in Coq. The first time I want to compute an instance of postprocess
, I have to wait for about 13 mn for the computation of mytable
to complete, but later calls to postprocess
are seemingly instantaneous.
Trying to save mytable in a .vo file does not work as I want, because if the table is defined as above, the .vo file only contains the definition, not the result of the computation, and the 13mn of waiting have to be suffered every time the .vo file is reloaded.
The first thing I tried was:
Definition my table := Eval vm_compute in <some code>.
But here I run in the problem that Coq wants to type-check the result of computation, and I was never patient enough to let this computation terminate (after more than half an hour of waiting).
I was able to extract the code and run the computation as extracted code, and this time computation only takes about 2 mn. If I want to avoid paying this 2mn cost every time I need to call postprocess
, I can actually have a first program that computes the big table and marshals it out to a file. This file is 15M in size. Reloading this file in another OCaml based program is quick, and I can call my postprocess
function and get my results.
With @Enrico Tassi , we chatted on the topic, and rightly pointed out that my question of efficient raw input and output actually came from the fact that the definition with Eval vm_compute
was unpractical. Why does type-checking takes so long? It should type check <some code>
, and then trust the result to have this type, not attempt to re-typecheck it. Enrico added typechecking is long because type FMapAVL
is complex, but if you really produce an array of integers, even if it contains a million elements, the type-checking of this piece of data should be manageable.
So I wrote the function to convert a FMapAVL to a sorted table of integers (for my case study, I was able to combine both the key and the associated value in a single integer, because the keys are numbers < 2 ^ 26, and the values are numbers < 5). Looking up in such a table has the same complexity as looking in a FMapAVL, thanks to an algorithm of search by dichotomy.
The following code:
Definnition myarray := Eval vm_compute in fmap_to_array <some code> array_size.
actually is manageable (still about 13mn), and I can then compile the file to obtain a file that is 37M in size.
I can then require this file in another Coq session and exploit the table without having to pay the cost of recomputing the table again.
And in the end, I can avoid adding marshaling capabilities to Elpi ;-)
I had similar experience with FMapAVL (used to implement multinomials, now in CoqEAL): they contain some proofs so computing them is unpractical, but computing any data (like your fmap_to_array) derived from them is fine.
Can you do this?
Require Import List.
Definition table := 1 :: 2 :: nil.
Definition f l := map (fun x => x * 2) l.
Definition mytable := ltac : (let u := (eval vm_compute in (f table)) in exact u).
If the table is an array, and not a list, then yes, it will stay efficient.
but it is really like Definition mytable := Eval vm_compue in f table.
@Enrico Tassi my bad
@Yves Bertot maybe this indicates you should directly use array not avl in the first place
we have the red-black tree based map here: https://github.com/coq-community/coq-mmaps (maybe faster than AVL, at least)
We discussed this idea yesterday. The missing bit seems to be a sorting algorithm (insertion) for the array
@Karl Palmskog do they include proofs (as AVLs)?
I mean "the data structures embeds proofs", not "did you prove them correct".
none of the core functions embed proofs, to my knowledge - I translated them easily to CakeML/HOL (no embedded proof support). But the design is similar to AVLs, so maybe there is some Prop somewhere (Pierre Letouzey can probably give exact details and design rationale)
@Yves Bertot : the problem of type checking might be solvable by some computation in the type of the term (rather than in the term itself) - I do this quite frequently. It can be a bit tricky to do it such that the kernel unification can take advantage of the computation, though. I can't say I understand why type checking of an FMapAVL.tree instance of numbers to numbers would be slow, but if you send me an example, I can have a look.
Using AVLs to produce the sorted data is convenient, elements are added one by one quite efficiently thanks to the AVL structure. Once, the data is complete, it is handy to use as a flat data-structure.
@Michael Soegtrop the whole development is available at the following link:
https://github.com/ybertot/breadth_first_search/
Some of the unpleasant computation problems appear only in past commits, so it maybe interesting to look at the log files of successive commits.
the project's README lacks a link to this "game" you are solving. Where is the fun without it?
@Yves Bertot : it is hard to derive your experience with computation from the commit log. Can you please point me to a specific computation in a specific file in a specific commit?
@Yves Bertot are you already using FMapAVL.Raw to avoid the embedded proofs?
@Paolo Giarrusso no I am not using it. I did not look enough into it, apparently.
Definition my table := Eval vm_compute in <some code giving _raw_ AVL trees>.
should be interesting
The suggestion of @Paolo Giarrusso is right. The following code succeeds:
Definition db_of_result
(r : intmap.t int * nat + list (int * int) * intmap.t int) :=
match r with inl(db, _) => db | inr(_, db) => db end.
Definition all_solutions_raw :=
intmap.this (db_of_result (cube_explore 20)).
Time Definition all_solutions_computed := Eval vm_compute in all_solutions_raw.
in reasonable time: 13 min
Last updated: Oct 04 2023 at 23:01 UTC