coqdep 8.15 slowdown, and per-library coqdep · Dune devs & users

Stream: Dune devs & users

Topic: coqdep 8.15 slowdown, and per-library coqdep

Paolo Giarrusso (Mar 02 2022 at 13:29):

coqdep 8.15 is slower so we should run it once per library like coq_makefile (and have it depend on all source files), correct?

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:29):

Yes, that's a fixme too

Paolo Giarrusso (Mar 02 2022 at 13:29):

I've already found @Emilio Jesús Gallego Arias 's https://github.com/ejgallego/dune/tree/coq+coqdep_per_theory :-)

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:29):

There is a tree on my dune repos calling it once per theory, hopefully we can finish it on the dune day

Paolo Giarrusso (Mar 02 2022 at 13:30):

wait, that's already implemented? I only found the mega refactoring

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:30):

I guess what we need to do is just to put the dep map inside the monad, and pass it around

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:30):

Need refresh that, can't do it today

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:31):

What needs to be done, IIRC:

update coqdep parsing so it builds a map (the old coq_dune code had already this functionality)
put the map in the memo monad, pass it around
then the rule instead just forces the map, and it should be correctly shared

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:31):

so nothing complicated in principle, but a lot of refactoring

Emilio Jesús Gallego Arias (Mar 02 2022 at 13:34):

https://github.com/coq/coq/blob/v8.10/tools/coq_dune.ml#L271

Rudi Grinberg (Mar 02 2022 at 17:34):

Why is running coqdep once per theory a good thing though?

Ali Caglayan (Mar 02 2022 at 20:33):

It is faster

Ali Caglayan (Mar 02 2022 at 20:34):

There is a significant performance increase if you coqdep all your files vs doing them one by one.

Ali Caglayan (Mar 02 2022 at 20:39):

As I understand it, coqdep goes through all the dependencies of a file and accumulates the visited file names, once it has all of them it computes the dependencies.

Ali Caglayan (Mar 02 2022 at 20:39):

So if you do this with all the files at once it is fast

Ali Caglayan (Mar 02 2022 at 20:39):

But if you do it with each file separately there will be a lot of repeated computations

Ali Caglayan (Mar 02 2022 at 20:40):

I guess a smart thing to do here would be to somehow cache these dependencies

Ali Caglayan (Mar 02 2022 at 20:58):

It's all happening in tools/coqdep_common.ml by the way. The overall structure looks very similar to ocamldep but there seems to be a bit more going on.

Rudi Grinberg (Mar 02 2022 at 21:14):

That's exactly the kind of optimization that I don't think is worth doing. Yes, it's faster, but the trade off is that you must re-run coqdep a lot more inputs. Sure, it could be worth it in practice, but in the long run it's the wrong way to go imo.

Rudi Grinberg (Mar 02 2022 at 21:14):

There's more interesting long term solutions that we plan to implement in dune that could be used for coqdep.

Rudi Grinberg (Mar 02 2022 at 21:17):

One solution is to address a "server mode" for actions. Dune could run a binary in the background, and then feed it requests that correspond to running actions. It would require the binary to speak a certain protocol, but could save a lot of processes spawned.

Ali Caglayan (Mar 02 2022 at 21:22):

@Rudi Grinberg Which kind of optimization are you referring to here? The batching optimization or caching dependencies between runs?

Rudi Grinberg (Mar 02 2022 at 21:22):

The server mode would address batching.

Enrico Tassi (Mar 02 2022 at 21:22):

Is there an ETA for this plan? It sounds nice, but if it won't happen soon, then the other one seems still nice to have while we wait.

Ali Caglayan (Mar 02 2022 at 21:26):

There is also the problem of dune not being the only user of coqdep. We also have coq_makefile which currently batches the calls for coqdep.

Ali Caglayan (Mar 02 2022 at 21:28):

But I guess you aren't really talking about changing the behavior of coqdep but rather how it is called by dune.

Rudi Grinberg (Mar 02 2022 at 21:33):

Nobody is working on it actively, but we do have many of the components ready. We have:
1) The dune action plugin which allows to run binaries in a special mode and communicate them with a protocol
2) The dune rpc library. Which allows rpc servers to be implemented.
It would require a little bit of elbow grease to combine the two, so I'd say it would take a committed dev around a month of full time to get something working?

Rudi Grinberg (Mar 02 2022 at 21:34):

We are unfortunately running into some performance problems with building ocaml code itself, so it's definitely on our radar. But for now, it's not the lowest hanging fruit.

Rodolphe Lepigre (Mar 02 2022 at 22:05):

Following a discussion I've had with @Paolo Giarrusso, my understanding is that running coqdep independently on each file (with Coq 8.15) is so slow that it makes a switch to dune almost unfeasible for a large project. So the proposed optimization is definitely worth doing in my opinion, even if that is not the right solution in the long run. We can always change the way dependencies are computed later, when Coq provides whatever tooling is needed to implement a more satisfactory solution.

Ali Caglayan (Mar 02 2022 at 22:29):

@Karl Palmskog might also be able to chip in since he has had a lot of experience porting projects to dune.

Ali Caglayan (Mar 02 2022 at 22:31):

Also coming back to the original point of this thread, coqdep 8.15 is doing a lot of checking to make sure that there are no duplicate hits for module names and that the module search it has predictable behavior, which was not so predictable before.

Paolo Giarrusso (Mar 02 2022 at 22:34):

yes, making coqdep correct seems worth it, even if it’s so much slower, given that in coq_makefile it takes negligible time.

Paolo Giarrusso (Mar 02 2022 at 22:37):

@Rudi Grinberg

That's exactly the kind of optimization that I don't think is worth doing. Yes, it's faster, but the trade off is that you must re-run coqdep a lot more inputs. Sure, it could be worth it in practice, but in the long run it's the wrong way to go imo.

First of all, let’s clarify that dune today seems entirely unusable for Coq, since changing one character requires rerunning coqdep on each file of the entire project — so it takes much longer than the actual build for small changes. I’ve had to go back to coq_makefile for actual development for now.

Paolo Giarrusso (Mar 02 2022 at 22:39):

Second, today the computation cost seems O(n) when running coqdep once vs O(n^2) when running coqdep for each file.

Paolo Giarrusso (Mar 02 2022 at 22:41):

where “seems” means “this is a rough approximation of what it feels as a user” — but it might also be rather accurate if the major factor is the filesystem scan as Emilio says (corrections welcome!)

Rudi Grinberg (Mar 02 2022 at 22:42):

That's because coqdep foo.v will read more than just foo.v, right?

Paolo Giarrusso (Mar 02 2022 at 22:44):

yes, to some extent Requires can’t be resolved without the entire directory listing

Paolo Giarrusso (Mar 02 2022 at 22:46):

re “one month for a committed developer”, I’ll be extremely happy if a name appears :-)

Paolo Giarrusso (Mar 02 2022 at 22:49):

but honestly my experience is that even things that Emilio describes as easy can take years — I guess because Emilio has _another_ full time job, so that’s perfectly fine :-)

Paolo Giarrusso (Mar 02 2022 at 22:50):

also IME, running coqdep once on all files (as in coq_makefile) hasn’t ever been noticeable, but I can collect some stats if you’d like

Rudi Grinberg (Mar 03 2022 at 01:56):

Yes, when I say one month I really mean full time work. The dune code is fairly well written and amendable to extension, but it's quite big and requires some upfront effort for sure.

Karl Palmskog (Mar 03 2022 at 06:29):

in the projects where I use Dune (small to medium size), we didn't get much slowdown from multiple coqdep calls. But due to composition of dependent projects during development, I can see that this may easily become a problem

Emilio Jesús Gallego Arias (Mar 03 2022 at 10:24):

There are no downsides to doing on coqdep per theory, and a big speedup, so we will implement it asap

Emilio Jesús Gallego Arias (Mar 03 2022 at 10:25):

we can go back to one coqdep per file, once we have something like -modules for coqdep

Emilio Jesús Gallego Arias (Mar 03 2022 at 10:25):

but that's tricky

Paolo Giarrusso (Jun 12 2022 at 06:05):

sorry for the ping, but any updates on coqdep-per-theory? I hope this has not been deferred because of the coqmod work…

Emilio Jesús Gallego Arias (Jun 12 2022 at 12:19):

Not a lot of updates, tho Dune 3.2 should be much better behaved w.r.t. coqdep

Emilio Jesús Gallego Arias (Jun 12 2022 at 12:30):

In the following sense: coqdep will be run at first N times for N files, but afterwards, it will only be run on the file that has changed.

Emilio Jesús Gallego Arias (Jun 12 2022 at 12:32):

I have pushed some refactorings tho that should help towards doing that once per theory tho. If someone wants to attack this TODO it should be easy:

All files now depend on theory.v.d , by a (memoized) rule that will parse it and build the dep map.

Emilio Jesús Gallego Arias (Jun 12 2022 at 12:32):

so actually after my latest PR that's pretty easy, just have deps_of take the map, and the map depend on the file

Last updated: Apr 16 2024 at 19:01 UTC