Coqdep feels slow these days, doesn't it? Could we make Dune call it only once for the whole stdlib? It would then be easier to profile and optimize it.
@Maxime Dénès have you seen the discussion of changing Dune to call coqdep once-and-for-all?
No, thanks for the pointer!
OK, this is then yet another independent confirmation that the Dune+coqdep issue will be rediscovered until fixed...
I don't feel like Coqdep got any slower. Do you have some specific test case?
I run dune build
on Coq's master ;)
Oh! So, that is not Coqdep, that is just plain Dune. If you perf
it, you will notice that all the time is wasted in computing the checksums of all the files of the repository.
It is doing that in between calls to coqdep
?
would be interesting if someone did a comparison when using https://github.com/ejgallego/dune/tree/coq+coqdep_per_theory --- but the fork may be too old
I slightly incorrectly expressed my "feeling" indeed: what feels slow is the batch of calls to coqdep
, maybe not each call. So if checksum computation is happening between calls, it could very well account for it.
@Guillaume Melquiond is this what you were saying it does?
Dune build does launch a lot of coqdep processes for sure...
Here is what I am saying:
$ dune build && echo crap >> theories/Reals/Reals.v && dune build && perf record dune build
$ perf report
43,79% dune dune [.] caml_MD5Transform
18,56% dune dune [.] mark_slice
6,12% dune dune [.] caml_page_table_lookup
but that doesn't account for the time spent in the coqdep processes though?
It does. I didn't pass --no-inherit
.
Here is the first coqdep
related line:
0,37% coqdep [unknown] [k] 0xffffffffa5a0405c
@Pierre-Marie Pédrot The time spent in coqdep
is an illusion IIUC. It is Dune being very slow moving from one coqdep call to the next.
but coqdep is IO-bound, so perf won't see that will it?
Indeed, but Dune reads the exact same files as Coqdep anyway. (And everything in the hot cache, so the time spent inside the kernel does not exceed 15%.)
s/inside the kernel/waiting for the kernel/
But then why do calls to ocamlc
/ocamlopt
look much faster?
Yeah, there is something very fishy there. I think I remember that @Emilio Jesús Gallego Arias mentioned that coqdep fetched the presence of all dependencies on the disk and that was the real issue.
But there was a change of behavior somewhere, no?
Could be just me not remembering how slow things were, but I doubt it.
dune build was always slow for me, this is not new
and the coqdep problem was already being discussed when the switch was performed AFAIR
But make was also calling coqdep
, does Dune do something special?
The discussion Karl pointed at starts with "coqdep 8.15 is slower so"
Makefile relies on the timestamps, whereas dune does something content-oriented
the makefile build is still fast for me, i.e. I don't have to wait ~5s before the v files start compiling
It is the usual confusion, if you just measure the time it takes to get a coqtop-prelude running, I'd say dune is OK. If you actually need to build the .vo, then dune is not great. If you change branch every ten minutes then dune thanks to the cache is blazing fast, if you cd ../another-work-tree
then it is slower ;-)
My use case was I was working on stuff in clib
and wanted to build stdlib.
No branching / switching worktrees involved.
Pierre-Marie Pédrot said:
Makefile relies on the timestamps, whereas dune does something content-oriented
Just because you are content-oriented does not mean that you have to be slow. For example, Git is content-oriented and it remembers the checksum of all the files. But it only recomputes the checksum of a given file if its modification time was changed since the last time it computed its checksum.
But make was also calling
coqdep
, does Dune do something special?
@Maxime Dénès make calls coqdep once, dune calls it for each file and has it depend on basically the entire source tree. And @Guillaume Melquiond 's analysis doesn't explain why this process is slower with coq 8.15 than 8.13 in clients...
Do you have actual measurements or is this just your feeling?
it’s the same kind of feeling as Maxime: the difference is obvious, the cause might be misattributed.
and I haven’t run perf
yet, because it doesn’t exist on MacOS.
but I’ll take your point that more analysis is needed!
OTOH: (1) I also upgraded to dune 3.0.2 recently, but that shouldn’t be relevant unless there’s a serious performance bug (2) I’m on MacOS not Linux, so relative performance might differ.
Some still unscientific benchmarks: So, with dune 2.9.3 and Coq 8.15 doing dune build theories/
in https://github.com/bedrocksystems/brick (with 138 source files), the phase where dune calls coqdep took ~18s, with dune taking ~5% CPUs and coqdep processes often showing up with 100% CPU usage. This was after echo '(* *)' >> theories/prelude/finite.v
. But these results won't be yet exactly reproducible.
Also that's 18s on an M1 Max with all 10 cores loaded, so at least 18*8 = 144s CPU time (ignoring the 2 efficiency cores). Next, I'll repeat this on Coq 8.13 to test if there's indeed a difference. [And probably move to Linux so I can _also_ collect perf
data]
Hum... I was trying to measure the slowdown on my computer. But I am not even able to compile Coq 8.13.2 and 8.14.1 with Dune, so as to compare. The builds fail with the following kind of errors (on fresh checkouts):
$ dune build
regen_readme doc/sphinx/README.gen.rst (exit 1)
(cd _build/default/doc/sphinx && ../tools/coqrst/regen_readme.py README.gen.rst)
Traceback (most recent call last):
File "../tools/coqrst/regen_readme.py", line 24, in <module>
import sphinx
ModuleNotFoundError: No module named 'sphinx'
File "topbin/coqtop_byte_bin.ml", line 22, characters 29-52:
22 | begin match Hashtbl.find Toploop.directive_table "rectypes" with
^^^^^^^^^^^^^^^^^^^^^^^
Error (alert deprecated): Toploop.directive_table
File "topbin/coqtop_byte_bin.ml", line 31, characters 42-59:
31 | { load_obj = (fun f -> if not (Topdirs.load_file ppf f)
^^^^^^^^^^^^^^^^^
Error (alert deprecated): Topdirs.load_file
maybe with --profile=release
?
FWIW https://dune.readthedocs.io/en/stable/advanced-topics.html?highlight=timing#profiling-dune should get dune's own profiling data (I'll add it to my experiments).
Maxime Dénès said:
maybe with
--profile=release
?
That helps. But now, for some reason, Dune has decided to run the whole testsuite.
A few things here:
dune build
is a super large target, it is not the fastest, dune build @check
should be the default, or the shim if you wanna launch coqtopAs for the default target for dune build
indeed, it includes all targets, the test-suite, the docs, etc... We have discussed restricting that, @Théo Zimmermann may know more about the status of it
@Guillaume Melquiond see Makefile.dune for better targets, you are trying to build the docs and it fails
I am pretty sure that, on master
, dune build
is not running the testsuite.
Yes, indeed Theo added this finally:
(alias
(name default)
(deps coq-core.install coq-stdlib.install coqide-server.install coqide.install))
but on older versions, you need to use these targets to compare
@Emilio Jesús Gallego Arias the other discussion was linked, but that perf
data asks a good question
@Paolo Giarrusso do you mind repeating the question?
Emilio Jesús Gallego Arias said:
dune build @check
should be the default
dune build @check
does not seem to build the theories. So, how do you use it to measure the time taken by Coqdep?
@Guillaume Melquiond I was referring to Maxime's use case for working on clib
if you wanna work on a particular .v file, then dune build foo.v && dune exec -- dev/shim/coqtop-prelude
is the best we have now, until dune coqtop
by @Rodolphe Lepigre lands in dune
which should happen soon thanks to him!
for coqdep there are 3 scenarios:
first and second require all the .v files to be in place, so it can be more expensive in some situations, for example if the generated .v files are expensive
2 is always better than 1 in most cases, but as I mentioned some time ago, i wanted to experiment with 1, but it was not a good idea
3 is the best, but we don't fully have a good spec for coqdep -modules
@Emilio Jesús Gallego Arias to repeat the question, Guillaume's perf
data suggests that (in his scenario) most time is spent in dune checksumming not on coqdep calls.
Paolo Giarrusso said:
Emilio Jesús Gallego Arias to repeat the question, Guillaume's
perf
data suggests that (in his scenario) most time is spent in dune checksumming not on coqdep calls.
I'm surprised to see that, my trace says something very different. hashing is only done when the heuristics to avoid it fail; there is a debug mode actually to see why a hash was triggerred
dune checks fstat etc... to avoid re-hashing, so while we have seen bugs in the past, a zero build should do no hashing
and that's in Coq builds, and my quick tests on a library blame coqdep again but they're much rougher
let me show the code that implements this check
yes, coqdep is the main bottleneck we have, even so with this dune is much faster than make in my experiments
it doesn't worry me a lot as this is an easy change to make, we just need to find some manpower
I'm still blocked until the 17th
:(
but I'll be happy to help , I think I already posted a blueprint on how to do it in the dune stream
your advising is useful, and as I keep saying, it's no problem even if you keep having 0 coding time on this for the foreseeable future, and the community needs to step up :-). Your mentorship will still be necessary.
here's the code that decides when to re-hash
@Paolo Giarrusso I suggest we continue the discussion about implementing the one coqdep per theory on the dune stream
So, here are the results on my computer with 8.13.2, 8.14.1, and master. The setting is the same as before: Coq compiles successfully, then Reals.v
is filled with crap, how long does it take to reach the error.
8.13.2:
$ \time dune build --profile=release coq.install coqide-server.install coqide.install
File "./theories/Reals/Reals.v", line 35, characters 0-4: Error: Syntax error: illegal begin of vernac.
Command exited with non-zero status 1
3.76user 0.61system 0:03.95elapsed 110%CPU (0avgtext+0avgdata 363908maxresident)k
0inputs+4800outputs (0major+190906minor)pagefaults 0swaps
8.14.1:
$ \time dune build --profile=release coq-core.install coq-stdlib.install coqide-server.install coqide.install
File "./theories/Reals/Reals.v", line 35, characters 0-4: Error: Syntax error: illegal begin of vernac.
Command exited with non-zero status 1
3.90user 0.52system 0:04.01elapsed 110%CPU (0avgtext+0avgdata 357320maxresident)k
0inputs+6472outputs (0major+191039minor)pagefaults 0swaps
master:
$ \time dune build --profile=release coq-core.install coq-stdlib.install coqide-server.install coqide.install
File "./theories/Reals/Reals.v", line 35, characters 0-4: Error: Syntax error: illegal begin of vernac.
Command exited with non-zero status 1
3.92user 0.53system 0:04.03elapsed 110%CPU (0avgtext+0avgdata 353312maxresident)k
0inputs+6272outputs (0major+190656minor)pagefaults 0swaps
So, in all the three cases, elapsed time is almost exactly 4 seconds. There is no difference between the versions of Coq.
Yes @Guillaume Melquiond , why there should be? I am sorry, I missed the original hypothesis you folks were testing
what's the dune version by the way
@Paolo Giarrusso said that it got much slower between 8.13 and master.
And this is Dune 2.9.1 for me.
there should be significant differences between 2.9.x and 3.0 for zero builds
Guillaume Melquiond said:
Paolo Giarrusso said that it got much slower between 8.13 and master.
And this is Dune 2.9.1 for me.
I see, thanks; likely due to our tweaks to targets, etc... still the workflow is in flux as we all know and suffer
last time I checked, doing one coqdep call instead of what we do today will save around 12 secs on a zero build for the stdlib
so that's pretty significant, in incremental builds not a lot, and in ml only build nothing
12 secs on my old machine, which I think it is pretty representative (4 cores / 8 threads power efficient mobile cpu)
@Guillaume Melquiond Thanks for your data, I'll keep it in mind and look more closely; but my claim was always on user libraries, among all the other differences in environments.
Were you composing many projects? (as in dune's slang)
@Enrico Tassi my project composes 4 theories, but the 18s test involves 2 theories. (EDIT: but this isn't real "dune composition" since there's a single dune-project
)
For those annoyed by repeated coqdep runs for all .v files in a theory, the issue should be fixed soon in master, turns out I had missed the right API
https://github.com/ocaml/dune/pull/5547
Guillaume Melquiond said:
Pierre-Marie Pédrot said:
Makefile relies on the timestamps, whereas dune does something content-oriented
Just because you are content-oriented does not mean that you have to be slow. For example, Git is content-oriented and it remembers the checksum of all the files. But it only recomputes the checksum of a given file if its modification time was changed since the last time it computed its checksum.
You might be pleased to know that dune already does that. In fact, it's the only thing dune ever did. If I had to guess, the checksumming in your trace is from comparing the digests computed from the stat calls. Doing a 0 build with all targets essentially requires stat'ing every single file in your tree, so it isn't all that cheap.
Emilio Jesús Gallego Arias said:
Note that these are the digests of the information derived from stat
. To see where the digest of the contents is computed, follow the callers of Cached_digest.build_file
Last updated: Jun 09 2023 at 07:01 UTC