Stream: Coq devs & plugin devs

Topic: Garbage collection time


view this post on Zulip Karl Palmskog (Oct 25 2022 at 06:23):

Is it known how much time Coq spends doing garbage collection when checking typical projects, e.g., those in CI?

Context: I talked to a co-author of a recent OOPSLA paper, where it is shown that with better tuning of heap size, one can reduce garbage collection time by up to 30% in applications like browsers. He has used Coq in the past, so they might be interested in experimenting with applying their technique to Coq.

view this post on Zulip Janno (Oct 25 2022 at 07:18):

I don't have an answer but I have a handful of data points. The x axis is the value of o in OCAMLRUNPARAM. This is with 4.14 and a=2: image.png

view this post on Zulip Janno (Oct 25 2022 at 07:20):

This profile was taken from some random proof in our development. I doubt our proofs are representative of developments in CI but maybe the graph is still useful.

view this post on Zulip Karl Palmskog (Oct 25 2022 at 07:25):

@Janno so you get a modest decrease in time with increased GC speed? What is RSS here?

view this post on Zulip Janno (Oct 25 2022 at 07:25):

resident set size. Basically actual memory used, not just virtual memory reserved.

view this post on Zulip Janno (Oct 25 2022 at 07:26):

I actually forgot what o tunes exactly. The tradeoff to me is that we can increase memory consumption drastically for a modest but noticeable speed bump.

view this post on Zulip Guillaume Melquiond (Oct 25 2022 at 07:26):

Did you also pass s=32M to OCAMLRUNPARAM? Otherwise, you will be in a completely different setting than vanilla Coq.

view this post on Zulip Janno (Oct 25 2022 at 07:26):

(This was necessary because Coq on 4.14 is just way slower on average than Coq on 4.07)

view this post on Zulip Janno (Oct 25 2022 at 07:27):

Guillaume Melquiond said:

Did you also pass s=32M to OCAMLRUNPARAM? Otherwise, you will be in a completely different setting than vanilla Coq.

I haven't heard of that before. I suppose I should run some more experiments. Thanks for the pointer!

view this post on Zulip Guillaume Melquiond (Oct 25 2022 at 07:31):

By default, Coq runs under the equivalent of OCAMLRUNPARAM=a=2,o=200,s=32M for OCaml >= 4.10.

view this post on Zulip Janno (Oct 25 2022 at 07:53):

Wow. My computer is a bit too noisy to report anything with certainty but it seems like s=32M speeds up a random proof I used for the experiment by 16% at o=200.

view this post on Zulip Janno (Oct 25 2022 at 07:53):

Do you happen to know when this setting was selected for OCaml >= 4.10?

view this post on Zulip Pierre Roux (Oct 25 2022 at 08:10):

You can look at the history of https://github.com/coq/coq/blob/master/sysinit/coqinit.ml

view this post on Zulip Guillaume Melquiond (Oct 25 2022 at 08:20):

Only the part a=2,o=200 was selected for OCaml >= 4.10. The part s=32Mhas been there for ages.

view this post on Zulip Janno (Oct 25 2022 at 08:27):

Interesting. I suppose when I saw the regression in 4.14 I immediately started playing around with OCAMLRUNPARAM without s in it and never realized that I introduced an additional regression that way.

view this post on Zulip Guillaume Melquiond (Oct 25 2022 at 08:28):

After checking, it has been there since 2012.

view this post on Zulip Janno (Oct 26 2022 at 14:44):

So far it seems that not having s=32M has been a blessing in disguise, at least on our CI servers.

view this post on Zulip Janno (Oct 26 2022 at 14:45):

I am still testing different values but so far any value above s=256k (the default when it is not overwritten by Coq) has resulted in unacceptable to catastrophic wall clock time increases.

view this post on Zulip Enrico Tassi (Oct 26 2022 at 14:53):

This is very weird, increasing the minor heap is not a coq specific trick. The default is really too small. Is it the only parameter you are tweaking?

view this post on Zulip Janno (Oct 26 2022 at 14:55):

Yes

view this post on Zulip Janno (Oct 26 2022 at 14:55):

The result locally on my laptop is very, very different. Even at s=256M I still see a decrease in wall clock time. Although the amount of cache misses lead to diminishing returns very quickly after s=16M or so.

view this post on Zulip Paolo Giarrusso (Oct 26 2022 at 19:21):

yeah we were looking at the data and it's... complicated, but

view this post on Zulip Paolo Giarrusso (Oct 26 2022 at 19:27):

I'm no expert here but memory bandwidth benchmarks exhibit performance cliffs each time you overflow a layer in the memory hierarchy, so "overflowing L3 hits a cliff on this architecture" wouldn't be shocking.

view this post on Zulip Enrico Tassi (Oct 26 2022 at 20:04):

Hum, I'm not very familiar with modern server hardware, but a while back more cores was paired with more cache. What you say seems to suggest you have many cores but little cache... :puzzle:

view this post on Zulip Paolo Giarrusso (Oct 26 2022 at 20:23):

the above was somewhat off, but this processor has 64MB cache and 12 cores. I guess we mustn't just fit the minor heap?

view this post on Zulip Paolo Giarrusso (Oct 26 2022 at 20:30):

BTW, is it true that data moves from minor to major heap after _1_ minor GC cycle? That's what https://v2.ocaml.org/api/Gc.html claims, but I'm used to semi-space collectors that move data between two semi-spaces

view this post on Zulip Enrico Tassi (Oct 26 2022 at 20:36):

In OCaml 4.x that is the case AFAIK


Last updated: Feb 06 2023 at 18:03 UTC