Stream: Dune devs & users

Topic: Caching/sharing ~/.cache/dune on Gitlab/via NFS


view this post on Zulip Paolo Giarrusso (Apr 04 2021 at 16:41):

For more efficient dune-based CI, https://gitlab.com/gasche/gitlab-ocaml-ci-example shows how to save and restore the _build folder*1 across CI runs. I wonder how far one could get; I’m hoping this is easier since AFAIK from the manual the cache data are immutable, but I’m not sure if there’s any cache metadata.

*1 In particular on Gitlab, with some workarounds for gitlab bugs — but similar ideas should apply elsewhere.

view this post on Zulip Emilio Jesús Gallego Arias (Apr 04 2021 at 17:17):

We have discussed a plan for actually having our workers to keep a cache locally, so indeed that could speed CI runs, however sharing the cache seems more tricky, I guess that in this case the preferred method is to actually have the cache daemon act as a server

view this post on Zulip Paolo Giarrusso (Apr 04 2021 at 23:53):

I mean, I’ve seen some plans for distributed caching in https://dune.build/blog/dune-retreat-2020/, but that sounds like a much harder problem (and complex solution) than “have the server listen over TCP rather than Unix domain sockets”. And it’s necessary in certain scenarios.

view this post on Zulip Paolo Giarrusso (Apr 04 2021 at 23:59):

But if I’m willing to, say, prefetch “everything” into a local cache (say with rsync), I can imagine simpler solutions on the dune side.

view this post on Zulip Emilio Jesús Gallego Arias (Apr 05 2021 at 15:55):

Indeed, all I know is that improved cache is very much into the dune roadmap as I'm pretty sure large industrial OCaml users do need that in other to replace some of their tooling with Dune, but I am not an expert; I'd suggest discussion in the Dune bug tracker to see how we can achieve this.

Coq for now will be more modest and just equip our own workers with a local cache.

view this post on Zulip Emilio Jesús Gallego Arias (Apr 07 2021 at 17:17):

By the way this is the cache implementation https://github.com/ocaml/dune/pull/4443

view this post on Zulip Rudi Grinberg (Oct 21 2021 at 02:29):

@Paolo Giarrusso my suggestion would be to wait until janestreet finishes their transition to dune internally. They have a distributed cache in jenga that they are porting to dune. Perhaps whatever they'll make will be reusable?

if you're curious, I can ask them for more details.

view this post on Zulip Rudi Grinberg (Oct 21 2021 at 02:30):

But your idea is on the right track. In fact, the github action for opam ci already enables and restores the dune cache for you to save build times.

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 00:45):

@Rudi Grinberg I'm curious; in the absence of this feature, we're looking into using Bazel.

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 01:25):

I already asked about it in fact. Here's what they said:

That's their general setup on Linux and it seems to work well. The client and the server are written using janestreet dependencies, so it's a little heavier to install them. The biggest caveat is that the client (nor the server but that's less important) works on Windows. So if Windows support is something that is important to you, the porting work would be on you.

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 01:29):

No windows, only linux and macos. Do they happen to have an approximate timeline?

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 01:31):

To be sure, this sounds great

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 02:16):

Not sure, but this is an active project for them. They cannot switch to dune without the shared cache

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 02:16):

I'll ask them when I get the chance.

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 16:39):

~6 months

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 17:34):

Thanks! That's nice to have an idea!

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:36):

Actually sharing the cache by NFS may work?

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:36):

if you network is fast enough

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:36):

I understand that NFS should support the required locking

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 17:45):

how closely have you dug into? I'm not an expert here, but NFS locking doesn't _just_ work.
https://unix.stackexchange.com/a/229680/28519 (that specific link might be outdated...)

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:52):

It's been a while I used nfs, but version 4 is supposed to support locking, dunno, what's your network setup?

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:52):

Worst case you could have your devs to rsync the cache nightly

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:52):

depending on your use case it may work from very well to very bad

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 17:53):

if I can just rsync the cache, great — I feared there would be a global cache index.

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 17:54):

maybe we should try to run some experiments, instead of what I'm doing — ask enough about internals to "prove" it will or won't work.

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:55):

I'm positive you can rsync the cache

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 17:55):

cc @Abhishek Anand

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:56):

Just be careful to do dune cache-daemon stop before , just in case

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 17:57):

Code is pretty straighforward , see the implementation in src/dune_cache_storage

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 18:00):

ah, that's main/3.0+alpha not 2.9.1

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 18:01):

(src/cache doesn't look much bigger)

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:04):

The only issue I can see is hardlinks

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:04):

not sure how that interacts with sync, indeed it complicates concurrent use a lot

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:05):

but I think the db doesn't do hardlinks per-se

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:05):

only the action to copy from the cache does use the hardlinks, but beware if rsync is gonna trim

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:06):

for ppl like me that works with multiple branches of ML projects, the cache saves like huge amount of time

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:06):

Note a difficulty you may have in sharing the cache is different dev env

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:06):

for example rules involving ocaml or coq won't be shared if the coq / ocaml version don't match

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:06):

etc...

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 18:06):

dune tends to be super precise with tracking

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 20:17):

Yes, you can just rsync the cache if you're all on the same platform. You can also disable hardlinks if those complicate things for you

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 20:36):

@Paolo Giarrusso if the cache doesn't work properly, the main branch of Dune has nice facilities to debug why some target is being rebuilt

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 21:08):

In fact, the cache in main had so many changes that I'd recommend experimenting with it first.

view this post on Zulip Paolo Giarrusso (Nov 03 2021 at 21:30):

so all the above applies to “cache in main branch” not to 2.9, right?

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 22:01):

No, not all. But it should be noted that the daemon has been removed to simplify things

view this post on Zulip Rudi Grinberg (Nov 03 2021 at 22:01):

Since the distributed part will be handled externally

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 22:12):

Main is also much faster for zero builds, so you may already save a bit of time, depending on your use case

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 22:13):

A good property of dune is that it is self-contained, just point to the right binary et voilà

view this post on Zulip Emilio Jesús Gallego Arias (Nov 03 2021 at 22:13):

you can use a dune build in one switch in any switch

view this post on Zulip Paolo Giarrusso (Nov 04 2021 at 06:58):

= you can build dune in switch A and use it in switch B, for all A and B, right?

view this post on Zulip Emilio Jesús Gallego Arias (Nov 04 2021 at 10:20):

Yes

view this post on Zulip Emilio Jesús Gallego Arias (Nov 04 2021 at 10:21):

Unless something was broken recently

view this post on Zulip Paolo Giarrusso (Mar 02 2022 at 12:27):

News: it seems all dune uses of POSIX file locking are dead code. Which is great, because POSIX file locking is broken beyond all hope even without NFS: https://github.com/ocaml/dune/pull/5501.

view this post on Zulip Paolo Giarrusso (Mar 02 2022 at 12:39):

AFAICT, write_atomically just relies on rename atomicity, which at least should be safe outside NFS.

view this post on Zulip Paolo Giarrusso (Jul 09 2022 at 05:35):

FWIW: rsync seems to be working pretty well. We’ll need to figure out how to expire the cache entries; I guess rsync, dune cache trim, rsync —delete should work, maybe modulo some locking


Last updated: Mar 29 2024 at 09:02 UTC