Hello @Enrico Tassi, PR coq/coq#13039 had a failure of the async job with:
Makefile.build:866: recipe for target 'theories/ZArith/Znumtheory.vo' failed make: *** [theories/ZArith/Znumtheory.vo] Segmentation fault (core dumped)
Are you interested in being notified about those failures?
I can't find the value of NJOBS in general. I see bedrocks gets a 1, but what is the default?
My problem is that I cannot reproduce it locally, and I start to think that these core dumps are actually about an out of memory. shared runners have 4G ram and async-proofs on potentially starts 2 coq processes per file with the same memory footprint.
How many processe are we running?
If you go to the top of the log, you get a lot of info.
cpu cores : 1
FWIW, the out-of-memory killer shouldn’t use SIGSEGV — but some code segfaults when allocation fails is very plausible (such bugs are really easy in systems code)
(I remember a few in the GHC runtime system)
but the end result is probably the same — if this were a Coq bug, I don’t expect coq should fix it (if it were in OCaml, OTOH)
As usual re-running the job on pyrolise that has 32G memory made it pass; having 4 coq processes exhaust 4G is plausible.
I'd go for NJOBS=1 (which will also make things more reproducible) but then I guess I should also increase the timeout a little.
And add a call to dmesg on failure if possible (is there a kwd for that?). Are you OK with this plan @Théo Zimmermann ?
Maybe I should just add dmesg to validate the thesis.
Yes, that sounds good.
I believe that
after_script is run in all cases, failure or not.
Let me check.
I've opened a PR
My current backport batch keeps failing in the test-suite:base+anync job with:
==========> TESTING bugs/bug_16803.v <========== Error: The command has not failed! 0m0.000000s 0m0.000000s 0m2.670000s 0m1.190000s ==========> FAILURE <========== bugs/bug_16803.v...Error! (bug seems to be opened, please check) FAILURES bugs/bug_16803.v...Error! (bug seems to be opened, please check)
Any idea what could be going on? The diff is here: https://github.com/coq/coq/compare/v8.17...Zimmi48:coq:staging-v8.17
I don't think that anything in it could be responsible for this issue.
yeah the test was added in https://github.com/coq/coq/pull/16810 so 16952 needs backporting
Last updated: Dec 05 2023 at 05:01 UTC