Stream: Coq devs & plugin devs

Topic: M1 performance


view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:23):

@Paolo Giarrusso are there any benchmarks of M1 vs. top x64 chipsets? Last time I tried I couldn't find much relevant, like just the regular CPU bound tasks...

view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:24):

with the platform, one might be able to see if Coq is better or worse at least

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 16:30):

I'd be happy to run CoqMark, just not to write it. And I don't have "top x64 chipset" unless you count a 6-core 16'' MBP (seems an i7-9750H?), and arguably you shouldn't (general benchmark: https://browser.geekbench.com/user/415396)

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 16:32):

in my unscientific Coq benchmarks, the M1 is miles ahead

view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:32):

from what I understand, top x64 laptops are currently AMD? or has it changed already?

view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:35):

I guess something like this but for Coq would be nice: https://nanoreview.net/en/cpu-compare/apple-m1-max-vs-amd-ryzen-9-5900x

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 16:38):

_if you ignore power consumption_ Alder Core 12th gen often beats AMD Ryzen 5900x/5950x (also not a laptop chip) and possibly even the M1 Max.

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 16:40):

_if you account for power consumption_, I refuse to consider i7-9750H a laptop chip (especially in those macs, and dunno if that's Intel's or Apple's fault), so I usually disabled Turbo Boost to make it less bad...

view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:40):

would be nice with special laptop models for formal proofs. Probably 128 GB+ RAM would be a nice start

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 16:41):

... I got 64GB and haven't been able to stress them out yet :-| ...

view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:41):

then you're not bootstrapping CakeML enough!

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 16:41):

quoting myself in private chats > FWIW, my favorite quick benchmark (<ELIDED>) gives: ~30s on my old MBP 2020, 22s on that one with Turbo enabled (default mode), and 12s on my new M1 Max

view this post on Zulip Karl Palmskog (Nov 30 2021 at 16:43):

there are interesting things one can do with enough RAM, like launching hundreds of Z3/Vampire/Eprover processes for proof search. Probably also some other automated tactics could be run with huge search spaces.

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 17:34):

If you want, https://github.com/coq/platform/issues/168 could serve as some interesting kind of benchmark?

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 17:35):

Or one could pick Coq's CI benchmarks and make them easy to run locally (and maybe scale them down?).

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 17:35):

In any case, I don't know of any such thing existing :-|

view this post on Zulip Karl Palmskog (Nov 30 2021 at 17:36):

probably Jason Gross's benchmarks are good, since they don't rely on I/O and so on and are parameterized

view this post on Zulip Paolo Giarrusso (Nov 30 2021 at 17:42):

So https://github.com/coq-community/coq-performance-tests I guess?

view this post on Zulip Karl Palmskog (Nov 30 2021 at 17:44):

https://github.com/coq/opam-coq-archive/blob/master/extra-dev/packages/coq-performance-tests-lite/coq-performance-tests-lite.dev/opam

view this post on Zulip Paolo Giarrusso (Dec 01 2021 at 10:41):

On ARM64 it crashes, it works through Rosetta, and time opam install coq-performance-tests-lite gives (with tasks in background)

real    5m39.498s
user    21m49.934s
sys 1m14.033s

but I assume that's not the right way to use this — I now have raw data installed, but processing scripts nowhere. I only installed it on the off-chance the package might do something cleverer.

view this post on Zulip Paolo Giarrusso (Dec 01 2021 at 10:41):

I can probably take that script + https://github.com/coq-community/coq-performance-tests/blob/master/.github/workflows/coq.yml as instructions, but I wonder if there's a way to get a summary score rather than a graph.

view this post on Zulip Paolo Giarrusso (Dec 01 2021 at 10:44):

and of course, even the timing is probably not too meaningful, from a benchmark you want the average (and standard error?) of a load, not the total runtime.

view this post on Zulip Karl Palmskog (Dec 01 2021 at 11:27):

we probably want to ping Jason about this and move it to the Coq dev stream

view this post on Zulip Notification Bot (Dec 01 2021 at 11:27):

This topic was moved here from #Coq Platform devs & users > M1 performance by Karl Palmskog

view this post on Zulip Karl Palmskog (Dec 01 2021 at 11:30):

@Jason Gross in your opinion, is it a good idea to use your performance benchmarks to compare Coq performance of different CPUs/laptops/etc.? Is there a simple way to get a summary score rather than a graph for, say, coq-performance-tests-lite?

This may be relevant for organizations that purchase computers explicitly for use by Coq engineers/researchers.

view this post on Zulip Enrico Tassi (Dec 01 2021 at 11:51):

I think you should bench the opam packages you use. Here you are measuring synthetic tests, biased toward what Jason was doing. In your recurrent projects you may not end up stressing the same coq codepaths.

view this post on Zulip Enrico Tassi (Dec 01 2021 at 11:54):

In general, it seems everybody is happy about this CPU

view this post on Zulip Karl Palmskog (Dec 01 2021 at 11:56):

if everyone tests what they use, there will be no summary scores to compare...

view this post on Zulip Enrico Tassi (Dec 01 2021 at 11:58):

I mean, to know if it is good for you...

view this post on Zulip Michael Soegtrop (Dec 01 2021 at 12:12):

Building Coq Platform is also a reasonable benchmark. Here (https://github.com/coq/platform/issues/168#issuecomment-967183359) are reports
that Coq Platform full builds with 4 threads on M1 with 16GB RAM in about 30 minutes. On an Intel MacBook Pro with i9-9980HK with 32GB RAM it is about 1 hour with 16 threads - it depends on temperature since it reduced clock to not overheat. I should try it with 4 threads to be comparable - maybe it is even faster ...

view this post on Zulip Karl Palmskog (Dec 01 2021 at 12:13):

the drawback of the Coq platform is that it is likely disk and memory bound, so not a good Coq CPU test

view this post on Zulip Michael Soegtrop (Dec 01 2021 at 12:14):

Well yes, but it is a reasonable broad real life benchmark of Coq.

view this post on Zulip Jason Gross (Dec 01 2021 at 16:06):

@Karl Palmskog Yeah, I think it's reasonable to use coq-performance-tests (but as Enrico says, it's biased towards microbenchmarks of what I use (I'm always happy to accept PRs to that repo though)). Whether you want the -lite version of a longer-running version depends on how much noise you're willing to tolerate. If you run make TIMED=1, I think you should get out total times for each file (which is a bit better than just timing opam install, which is what the Coq bench does), though sometimes the time will end up being dominated by something other than what the file claims to test. There's also a target perf-gnuplots which will emit *.txt-fits.dat file which will give best-fit curves for most of the tests. Then you can read off the summary statistics off the best-fit-curve parameters. If you have a suggestion for how to display this as a summary, I'm happy to write a makefile target that aggregates the best-fit-parameters into some sort of summary score.

view this post on Zulip Paolo Giarrusso (Dec 01 2021 at 20:17):

@Michael Soegtrop is that link using 4 threads? Because it’s on a 14’’ 10 core M1 Pro

view this post on Zulip Kartik Singhal (Dec 01 2021 at 23:00):

Hi @Paolo Giarrusso I performed those tests. I picked 4 make jobs in the options that Coq platform script presented me, but note that it refers to 4 parallel jobs, not threads, each job, in turn, uses multiple threads as well. I do not know how many threads, but all cores seemed occupied during the build and I could see 4 opam packages being built in parallel.

view this post on Zulip Paolo Giarrusso (Dec 01 2021 at 23:13):

Right, classic opam!

view this post on Zulip Michael Soegtrop (Dec 02 2021 at 09:27):

Sorry, I was not precise. Yes this is 4 parallel opam jobs x 4 parallel make jobs (for most packages), so at most 16 threads. Some day we need to have a clean thread pool management throughout opam, make, dune, remake, ...

view this post on Zulip Paolo Giarrusso (Dec 02 2021 at 10:56):

(Beyond make's jobserver, there's also make -l, which just checks the load average of the system).

view this post on Zulip Karl Palmskog (Dec 02 2021 at 10:59):

For pure Coq projects and plugins and OCaml projects, if we all switch to Dune (which is sort of the plan), I think this would be solved, right?

view this post on Zulip Guillaume Melquiond (Dec 02 2021 at 12:18):

No, you still need Opam and Dune to share the same thread pool. (Or did you mean to suggest dropping Opam too?) And as long as they use a standard protocol for sharing it (e.g., Make's jobserver), others will also be able to make use of it.

view this post on Zulip Karl Palmskog (Dec 02 2021 at 12:47):

I speculated that since 40%+ of opam packages use Dune, they could probably just do a solution for job-sharing that's specific to Dune in opam, and this would be less complicated than supporting all of Make, OCamlbuild, etc.

view this post on Zulip Paolo Giarrusso (Dec 02 2021 at 17:47):

If a dune-specific solution were easier, sure — but while make's jobserver is not trivial (read a character to down a semaphore?), it seems hard to do better

view this post on Zulip Emilio Jesús Gallego Arias (Dec 02 2021 at 18:03):

As far as I know the lack of good integration between opam and dune / make is just a lack of manpower to implement it so if any of you would like to help I'm sure help will be super appreciated

view this post on Zulip Emilio Jesús Gallego Arias (Dec 02 2021 at 18:04):

Related https://github.com/ocaml/dune/pull/4331

view this post on Zulip Paolo Giarrusso (Dec 02 2021 at 18:52):

@Kartik Singhal could you please create an MR with your changes? I see instructions on https://github.com/coq/platform/issues/168 that I could follow, but such an MR would be more generally useful, even if it were incomplete for some reason

view this post on Zulip Gaëtan Gilbert (Dec 02 2021 at 19:02):

Emilio Jesús Gallego Arias said:

Related https://github.com/ocaml/dune/pull/4331

and now it's closed :D

view this post on Zulip Kartik Singhal (Dec 02 2021 at 20:09):

Hi, @Paolo Giarrusso, I believe you mean a PR? I actually don't have any changes. Most of the problems that I ran into are already being fixed in the main branch of coq-platform, so it you'd like to test, just start from there.

view this post on Zulip Paolo Giarrusso (Dec 02 2021 at 20:17):

right, PR — I got confused because Gitlab says MR ("Merge Request")

view this post on Zulip Paolo Giarrusso (Dec 02 2021 at 20:17):

and thanks, I had been using the tag

view this post on Zulip Emilio Jesús Gallego Arias (Dec 02 2021 at 21:37):

Gaëtan Gilbert said:

Emilio Jesús Gallego Arias said:

Related https://github.com/ocaml/dune/pull/4331

and now it's closed :D

O_O seems there were some critical problems? Didn't have the time to read the discussion, pity

view this post on Zulip Paolo Giarrusso (Dec 02 2021 at 21:42):

It seems another case of dune devs being extremely strict, despite the Linux-vs-make recent issues (but that's probably a separate thread)

view this post on Zulip Paolo Giarrusso (Dec 03 2021 at 00:13):

Anyway, with "16 threads" (and High Power Mode, 16''-inch only) the build took ~25 minutes (while using the machine normally; iStat Menus was the only program seeming to struggle, and a couple Safari tabs did crash and self-reload, and it spun up audible fans). I'm utterly amazed the machine kept chugging despite 60 coq processes.

view this post on Zulip Paolo Giarrusso (Dec 03 2021 at 00:15):

But in earlier smaller tests, high power mode didn't seem to give a huge speedup; 16 threads might have helped more, since the build wasn't fully parallel (it didn't saturate all cores throughout; a rough estimate suggests 6 cores on average)

view this post on Zulip Emilio Jesús Gallego Arias (Dec 03 2021 at 00:41):

jscoq + addons does provide a nice case for a fully dunerized build

view this post on Zulip Emilio Jesús Gallego Arias (Dec 03 2021 at 00:41):

I hope I can finish the composition patch for Dune this holidays tho

view this post on Zulip Emilio Jesús Gallego Arias (Dec 03 2021 at 00:41):

that's a super way to bench


Last updated: Feb 06 2023 at 00:03 UTC