@Paolo Giarrusso are there any benchmarks of M1 vs. top x64 chipsets? Last time I tried I couldn't find much relevant, like just the regular CPU bound tasks...
with the platform, one might be able to see if Coq is better or worse at least
I'd be happy to run CoqMark, just not to write it. And I don't have "top x64 chipset" unless you count a 6-core 16'' MBP (seems an i7-9750H?), and arguably you shouldn't (general benchmark: https://browser.geekbench.com/user/415396)
in my unscientific Coq benchmarks, the M1 is miles ahead
from what I understand, top x64 laptops are currently AMD? or has it changed already?
I guess something like this but for Coq would be nice: https://nanoreview.net/en/cpu-compare/apple-m1-max-vs-amd-ryzen-9-5900x
_if you ignore power consumption_ Alder Core 12th gen often beats AMD Ryzen 5900x/5950x (also not a laptop chip) and possibly even the M1 Max.
_if you account for power consumption_, I refuse to consider i7-9750H a laptop chip (especially in those macs, and dunno if that's Intel's or Apple's fault), so I usually disabled Turbo Boost to make it less bad...
would be nice with special laptop models for formal proofs. Probably 128 GB+ RAM would be a nice start
... I got 64GB and haven't been able to stress them out yet :-| ...
then you're not bootstrapping CakeML enough!
quoting myself in private chats > FWIW, my favorite quick benchmark (<ELIDED>) gives: ~30s on my old MBP 2020, 22s on that one with Turbo enabled (default mode), and 12s on my new M1 Max
there are interesting things one can do with enough RAM, like launching hundreds of Z3/Vampire/Eprover processes for proof search. Probably also some other automated tactics could be run with huge search spaces.
If you want, https://github.com/coq/platform/issues/168 could serve as some interesting kind of benchmark?
Or one could pick Coq's CI benchmarks and make them easy to run locally (and maybe scale them down?).
In any case, I don't know of any such thing existing :-|
probably Jason Gross's benchmarks are good, since they don't rely on I/O and so on and are parameterized
So https://github.com/coq-community/coq-performance-tests I guess?
On ARM64 it crashes, it works through Rosetta, and time opam install coq-performance-tests-lite
gives (with tasks in background)
real 5m39.498s
user 21m49.934s
sys 1m14.033s
but I assume that's not the right way to use this — I now have raw data installed, but processing scripts nowhere. I only installed it on the off-chance the package might do something cleverer.
I can probably take that script + https://github.com/coq-community/coq-performance-tests/blob/master/.github/workflows/coq.yml as instructions, but I wonder if there's a way to get a summary score rather than a graph.
and of course, even the timing is probably not too meaningful, from a benchmark you want the average (and standard error?) of a load, not the total runtime.
we probably want to ping Jason about this and move it to the Coq dev stream
This topic was moved here from #Coq Platform devs & users > M1 performance by Karl Palmskog
@Jason Gross in your opinion, is it a good idea to use your performance benchmarks to compare Coq performance of different CPUs/laptops/etc.? Is there a simple way to get a summary score rather than a graph for, say, coq-performance-tests-lite
?
This may be relevant for organizations that purchase computers explicitly for use by Coq engineers/researchers.
I think you should bench the opam packages you use. Here you are measuring synthetic tests, biased toward what Jason was doing. In your recurrent projects you may not end up stressing the same coq codepaths.
In general, it seems everybody is happy about this CPU
if everyone tests what they use, there will be no summary scores to compare...
I mean, to know if it is good for you...
Building Coq Platform is also a reasonable benchmark. Here (https://github.com/coq/platform/issues/168#issuecomment-967183359) are reports
that Coq Platform full builds with 4 threads on M1 with 16GB RAM in about 30 minutes. On an Intel MacBook Pro with i9-9980HK with 32GB RAM it is about 1 hour with 16 threads - it depends on temperature since it reduced clock to not overheat. I should try it with 4 threads to be comparable - maybe it is even faster ...
the drawback of the Coq platform is that it is likely disk and memory bound, so not a good Coq CPU test
Well yes, but it is a reasonable broad real life benchmark of Coq.
@Karl Palmskog Yeah, I think it's reasonable to use coq-performance-tests (but as Enrico says, it's biased towards microbenchmarks of what I use (I'm always happy to accept PRs to that repo though)). Whether you want the -lite
version of a longer-running version depends on how much noise you're willing to tolerate. If you run make TIMED=1
, I think you should get out total times for each file (which is a bit better than just timing opam install
, which is what the Coq bench does), though sometimes the time will end up being dominated by something other than what the file claims to test. There's also a target perf-gnuplots
which will emit *.txt-fits.dat
file which will give best-fit curves for most of the tests. Then you can read off the summary statistics off the best-fit-curve parameters. If you have a suggestion for how to display this as a summary, I'm happy to write a makefile target that aggregates the best-fit-parameters into some sort of summary score.
@Michael Soegtrop is that link using 4 threads? Because it’s on a 14’’ 10 core M1 Pro
Hi @Paolo Giarrusso I performed those tests. I picked 4 make jobs in the options that Coq platform script presented me, but note that it refers to 4 parallel jobs, not threads, each job, in turn, uses multiple threads as well. I do not know how many threads, but all cores seemed occupied during the build and I could see 4 opam packages being built in parallel.
Right, classic opam!
Sorry, I was not precise. Yes this is 4 parallel opam jobs x 4 parallel make jobs (for most packages), so at most 16 threads. Some day we need to have a clean thread pool management throughout opam, make, dune, remake, ...
(Beyond make's jobserver, there's also make -l, which just checks the load average of the system).
For pure Coq projects and plugins and OCaml projects, if we all switch to Dune (which is sort of the plan), I think this would be solved, right?
No, you still need Opam and Dune to share the same thread pool. (Or did you mean to suggest dropping Opam too?) And as long as they use a standard protocol for sharing it (e.g., Make's jobserver), others will also be able to make use of it.
I speculated that since 40%+ of opam packages use Dune, they could probably just do a solution for job-sharing that's specific to Dune in opam, and this would be less complicated than supporting all of Make, OCamlbuild, etc.
If a dune-specific solution were easier, sure — but while make's jobserver is not trivial (read a character to down a semaphore?), it seems hard to do better
As far as I know the lack of good integration between opam and dune / make is just a lack of manpower to implement it so if any of you would like to help I'm sure help will be super appreciated
Related https://github.com/ocaml/dune/pull/4331
@Kartik Singhal could you please create an MR with your changes? I see instructions on https://github.com/coq/platform/issues/168 that I could follow, but such an MR would be more generally useful, even if it were incomplete for some reason
Emilio Jesús Gallego Arias said:
and now it's closed :D
Hi, @Paolo Giarrusso, I believe you mean a PR? I actually don't have any changes. Most of the problems that I ran into are already being fixed in the main branch of coq-platform, so it you'd like to test, just start from there.
right, PR — I got confused because Gitlab says MR ("Merge Request")
and thanks, I had been using the tag
Gaëtan Gilbert said:
Emilio Jesús Gallego Arias said:
and now it's closed :D
O_O seems there were some critical problems? Didn't have the time to read the discussion, pity
It seems another case of dune devs being extremely strict, despite the Linux-vs-make recent issues (but that's probably a separate thread)
Anyway, with "16 threads" (and High Power Mode, 16''-inch only) the build took ~25 minutes (while using the machine normally; iStat Menus was the only program seeming to struggle, and a couple Safari tabs did crash and self-reload, and it spun up audible fans). I'm utterly amazed the machine kept chugging despite 60 coq processes.
But in earlier smaller tests, high power mode didn't seem to give a huge speedup; 16 threads might have helped more, since the build wasn't fully parallel (it didn't saturate all cores throughout; a rough estimate suggests 6 cores on average)
jscoq + addons does provide a nice case for a fully dunerized build
I hope I can finish the composition patch for Dune this holidays tho
that's a super way to bench
Last updated: May 31 2023 at 16:01 UTC