Stream: Coq users

Topic: tool to parse Coq .vo files in Rust


view this post on Zulip Huỳnh Trần Khanh (Dec 17 2023 at 12:37):

https://github.com/digama0/coq-rs

recently created repo, not sure if it works or not

view this post on Zulip Paolo Giarrusso (Dec 17 2023 at 12:49):

Relevant, from author https://coq.zulipchat.com/#narrow/stream/237977-Coq-users/topic/Per-folder.20opam.20switch.3F/near/408251211

view this post on Zulip Paolo Giarrusso (Dec 17 2023 at 12:51):

It might be too early, but I wonder how it compares with votour @Mario Carneiro

view this post on Zulip Mario Carneiro (Dec 17 2023 at 12:53):

I'm not very familiar with votour (or most coq tools for that matter), so you will have to ask more specifically what you would like compared

view this post on Zulip Mario Carneiro (Dec 17 2023 at 12:54):

I recently added _CoqProject parsing and dependency resolution so now I can point coq-rs to a random file in metacoq and it will load all the libs without further guidance

view this post on Zulip Mario Carneiro (Dec 17 2023 at 12:56):

it takes about 3.3s to parse the 405 transitive dependencies of my example test case MetaCoq.PCUIC.PCUICAlpha

view this post on Zulip Mario Carneiro (Dec 17 2023 at 12:57):

unclear on how competitive this is with coqchk, coqc or votour

view this post on Zulip Mario Carneiro (Dec 17 2023 at 12:57):

I thought about doing it multithreaded but the dependencies between files make this not completely trivial so maybe later

view this post on Zulip Paolo Giarrusso (Dec 17 2023 at 13:06):

votour doesn't resolve dependencies I think, it's lower-level... You just point it to a .vo file and walk through the prompts

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:08):

Relatedly, a few years ago I wrote a similar Rust library to maximally compact vo files: https://github.com/ppedrot/kravanenn

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:09):

yes I noticed that after having written this :)

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:16):

it is indeed quite similar, although I'm trending mostly in the direction of the feature set of coqchk so dependency resolution is important and votour-like capabilities less so. I like the serde implementation though, I was thinking of doing something similar to avoid having to parse the ocaml memory twice

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:17):

(the way that pointers are encoded in the file is very unfortunate, it makes it basically impossible to zero-copy use the input file)

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:18):

I think #7839 was on to something, calling for an actual serialization format rather than this ocaml dump which seems optimized for writing rather than reading

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:20):

alas I suppose your hands are tied by ocaml here since the parsing code is all in the runtime itself

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:22):

Pierre-Marie Pédrot said:

Relatedly, a few years ago I wrote a similar Rust library to maximally compact vo files: https://github.com/ppedrot/kravanenn

Oh, actually I didn't realize you were looking at writing and not just reading. I'm surprised coq doesn't do this "compaction" by default...?

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:22):

Indeed, we're tied to OCaml for that.

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:23):

The compaction part is partially performed by hashconsing, but due to separate compilation hashconsing is incomplete.

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:23):

Rehashconsing at runtime would be extremely costly, we had a lot of benchmarks around this concept some years ago.

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:24):

do you mean rehashconsing on write or on read?

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:24):

Rehashconsing on read. We do it on write already.

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:25):

doing anything on read looks to be extremely costly

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:25):

the fact that just reading one file costs 3 seconds is crazy to me

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:26):

Also, I don't really buy the reimplementation of the serializer to a proper format is going to buy us a lot in term of perfs.

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:26):

The OCaml marshalling algorithm does C trickery that is impossible to mimic in OCaml.

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:26):

Notably it's very clever about subterm sharing.

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:27):

Lean's ace up its sleeve here is that it mmaps the file and uses the objects directly

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:27):

There is a similar thing for OCaml though.

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:27):

It's called ocaml-ancient, I had some attempts at using that at some point

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:28):

(naive question: doesn't that mean that the Lean object format is going to produce big files?)

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:28):

Ooooh yes

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:28):

mathlib is pushing 4GB

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:28):

eh.

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:29):

I wrote a format called .lgz which is basically the olean format but serialized smartly and it has a 20x size improvement

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:29):

Ah, indeed.

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:29):

Given the name, I guess you perform some compression?

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:30):

true, I can't claim all the credit

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:30):

it's 6x smaller before compression

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:30):

I see. Do you cache the decompression phase at the file system level?

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:31):

Or do you decompress at every require?

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:31):

yes, the compression / decompression is done as part of lake exe cache, which is mathlib's build caching infrastructure

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:31):

Ah, it's not a language level support. :/

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:32):

users download .ltar files corresponding to build artifacts from mathlib into a global cache, and then there is a tool to unpack these into .olean files and that's what lean uses

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:33):

Do you plan to backport that into Lean proper somehow?

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:34):

but there are plans to upstream cache as part of lake, so that everyone can have build caching, and I would also like to replace the olean format itself with the uncompressed version of .lgz, since you can still basically seek around in that

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:35):

the major constraint preventing easy wins on the .olean file format itself is that everything has to be a valid lean object, there is no deserialization step

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:36):

so for any replacement format it would have to go through some kind of indirection where you have pointers into a compressed blob rather than native objects

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:37):

I suspect that this is actually okay for the main purpose of olean / vo files which is to give access to upstream declarations, because you usually need to vivify them in some way when using them during typechecking (e.g. instantiating arguments)

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:37):

Having a compact format for storage that is uncompressed in a cached way to some mmapable data structure looks reasonable to me.

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:38):

the other part that makes the file size hurt is that lake currently doesn't have any concept of global dependency installs like ocaml switches, so if you have 20 projects depending on mathlib then lake will give you 20 mathlibs taking up 80GB

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:39):

disk usage was clearly not a concern during the design process :upside_down:

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:40):

Given the trend in the real world to have humongous VM or docker images, crazy AI models, I can't blame the people who made this choice...

view this post on Zulip Mario Carneiro (Dec 17 2023 at 13:41):

(Conversely, the coq/ocaml design seems to prefer smaller footprint even at the cost of config mismatches and build breakage)

view this post on Zulip Pierre-Marie Pédrot (Dec 17 2023 at 13:41):

Coq was designed to run on machines from the 80's originally, so...

view this post on Zulip Emilio Jesús Gallego Arias (Dec 17 2023 at 23:21):

Mario Carneiro said:

(Conversely, the coq/ocaml design seems to prefer smaller footprint even at the cost of config mismatches and build breakage)

This is not the case in more modern build infra that uses Dune or coq-lsp. Such builds are actually sound; however still we got quite a few todos on the table to reach 1.0, but that's been in use in production for some time

view this post on Zulip Paolo Giarrusso (Dec 18 2023 at 01:06):

@Emilio Jesús Gallego Arias I think that remark is in the context of https://coq.zulipchat.com/#narrow/stream/237977-Coq-users/topic/Per-folder.20opam.20switch.3F , something which dune by itself doesn't solve yet

view this post on Zulip Mario Carneiro (Dec 18 2023 at 01:10):

I was thinking specifically of https://coq.zulipchat.com/#narrow/stream/237977-Coq-users/topic/Per-folder.20opam.20switch.3F/near/408242789, does dune solve this issue?

view this post on Zulip Mario Carneiro (Dec 18 2023 at 01:11):

that is, installing packages can cause breakage in unrelated packages because they share a common resource - the switch

view this post on Zulip Emilio Jesús Gallego Arias (Dec 18 2023 at 10:44):

I see, I was thinking more of other problems. I think the problem you refer to is just wrong install instructions.

But indeed, opam has package locks, and dune is adding Cargo-like capabilties, both of these can be used to alleviate the problem.

view this post on Zulip Huỳnh Trần Khanh (Dec 18 2023 at 23:27):

@Mario Carneiro ig your tool could be quite useful, maybe put a license?


Last updated: Jun 23 2024 at 04:03 UTC