For more efficient dune-based CI, https://gitlab.com/gasche/gitlab-ocaml-ci-example shows how to save and restore the
_build folder*1 across CI runs. I wonder how far one could get; I’m hoping this is easier since AFAIK from the manual the cache data are immutable, but I’m not sure if there’s any cache metadata.
I guess this should work for
~/.cache/dune as well, or would you expect issues?
Could one share the cache folder via NFS, to support _parallel_ builds? Dune couldn’t hard link files between the cache and the build tree; and one would need to disable the cache daemon (since it uses a Unix domain socket). Does the cache need any synchronization? File locking and NFS aren’t friends IIRC.
*1 In particular on Gitlab, with some workarounds for gitlab bugs — but similar ideas should apply elsewhere.
We have discussed a plan for actually having our workers to keep a cache locally, so indeed that could speed CI runs, however sharing the cache seems more tricky, I guess that in this case the preferred method is to actually have the cache daemon act as a server
I mean, I’ve seen some plans for distributed caching in https://dune.build/blog/dune-retreat-2020/, but that sounds like a much harder problem (and complex solution) than “have the server listen over TCP rather than Unix domain sockets”. And it’s necessary in certain scenarios.
But if I’m willing to, say, prefetch “everything” into a local cache (say with
rsync), I can imagine simpler solutions on the dune side.
Indeed, all I know is that improved cache is very much into the dune roadmap as I'm pretty sure large industrial OCaml users do need that in other to replace some of their tooling with Dune, but I am not an expert; I'd suggest discussion in the Dune bug tracker to see how we can achieve this.
Coq for now will be more modest and just equip our own workers with a local cache.
By the way this is the cache implementation https://github.com/ocaml/dune/pull/4443
@Paolo Giarrusso my suggestion would be to wait until janestreet finishes their transition to dune internally. They have a distributed cache in jenga that they are porting to dune. Perhaps whatever they'll make will be reusable?
if you're curious, I can ask them for more details.
But your idea is on the right track. In fact, the github action for opam ci already enables and restores the dune cache for you to save build times.
@Rudi Grinberg I'm curious; in the absence of this feature, we're looking into using Bazel.
I already asked about it in fact. Here's what they said:
That's their general setup on Linux and it seems to work well. The client and the server are written using janestreet dependencies, so it's a little heavier to install them. The biggest caveat is that the client (nor the server but that's less important) works on Windows. So if Windows support is something that is important to you, the porting work would be on you.
No windows, only linux and macos. Do they happen to have an approximate timeline?
To be sure, this sounds great
Not sure, but this is an active project for them. They cannot switch to dune without the shared cache
I'll ask them when I get the chance.
Thanks! That's nice to have an idea!
Actually sharing the cache by NFS may work?
if you network is fast enough
I understand that NFS should support the required locking
how closely have you dug into? I'm not an expert here, but NFS locking doesn't _just_ work.
https://unix.stackexchange.com/a/229680/28519 (that specific link might be outdated...)
It's been a while I used nfs, but version 4 is supposed to support locking, dunno, what's your network setup?
Worst case you could have your devs to rsync the cache nightly
depending on your use case it may work from very well to very bad
if I can just rsync the cache, great — I feared there would be a global cache index.
maybe we should try to run some experiments, instead of what I'm doing — ask enough about internals to "prove" it will or won't work.
I'm positive you can rsync the cache
cc @Abhishek Anand
Just be careful to do
dune cache-daemon stop before , just in case
Code is pretty straighforward , see the implementation in
ah, that's main/3.0+alpha not 2.9.1
src/cache doesn't look much bigger)
The only issue I can see is hardlinks
not sure how that interacts with sync, indeed it complicates concurrent use a lot
but I think the db doesn't do hardlinks per-se
only the action to copy from the cache does use the hardlinks, but beware if rsync is gonna trim
for ppl like me that works with multiple branches of ML projects, the cache saves like huge amount of time
Note a difficulty you may have in sharing the cache is different dev env
for example rules involving ocaml or coq won't be shared if the coq / ocaml version don't match
dune tends to be super precise with tracking
Yes, you can just rsync the cache if you're all on the same platform. You can also disable hardlinks if those complicate things for you
@Paolo Giarrusso if the cache doesn't work properly, the
main branch of Dune has nice facilities to debug why some target is being rebuilt
In fact, the cache in main had so many changes that I'd recommend experimenting with it first.
so all the above applies to “cache in
main branch” not to 2.9, right?
No, not all. But it should be noted that the daemon has been removed to simplify things
Since the distributed part will be handled externally
Main is also much faster for zero builds, so you may already save a bit of time, depending on your use case
A good property of dune is that it is self-contained, just point to the right binary et voilà
you can use a dune build in one switch in any switch
= you can build dune in switch A and use it in switch B, for all A and B, right?
Unless something was broken recently
News: it seems all dune uses of POSIX file locking are dead code. Which is great, because POSIX file locking is broken beyond all hope even without NFS: https://github.com/ocaml/dune/pull/5501.
write_atomically just relies on
rename atomicity, which at least should be safe outside NFS.
FWIW: rsync seems to be working pretty well. We’ll need to figure out how to expire the cache entries; I guess rsync, dune cache trim, rsync —delete should work, maybe modulo some locking
Last updated: Jun 04 2023 at 23:30 UTC