Stream: Coq devs & plugin devs

Topic: Async failure


view this post on Zulip Théo Zimmermann (Sep 17 2020 at 08:14):

Hello @Enrico Tassi, PR coq/coq#13039 had a failure of the async job with:

Makefile.build:866: recipe for target 'theories/ZArith/Znumtheory.vo' failed
make[1]: *** [theories/ZArith/Znumtheory.vo] Segmentation fault (core dumped)

(see https://gitlab.com/coq/coq/-/jobs/741853901)

Are you interested in being notified about those failures?

view this post on Zulip Enrico Tassi (Sep 17 2020 at 08:16):

Yes thanks

view this post on Zulip Enrico Tassi (Sep 17 2020 at 08:39):

I can't find the value of NJOBS in general. I see bedrocks gets a 1, but what is the default?
My problem is that I cannot reproduce it locally, and I start to think that these core dumps are actually about an out of memory. shared runners have 4G ram and async-proofs on potentially starts 2 coq processes per file with the same memory footprint.
How many processe are we running?

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 09:14):

If you go to the top of the log, you get a lot of info.

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 09:14):

In particular, NJOBS=2

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 09:15):

cpu cores : 1

view this post on Zulip Paolo Giarrusso (Sep 17 2020 at 09:30):

FWIW, the out-of-memory killer shouldn’t use SIGSEGV — but some code segfaults when allocation fails is very plausible (such bugs are really easy in systems code)

view this post on Zulip Paolo Giarrusso (Sep 17 2020 at 09:30):

(I remember a few in the GHC runtime system)

view this post on Zulip Paolo Giarrusso (Sep 17 2020 at 09:31):

but the end result is probably the same — if this were a Coq bug, I don’t expect coq should fix it (if it were in OCaml, OTOH)

view this post on Zulip Enrico Tassi (Sep 17 2020 at 11:02):

As usual re-running the job on pyrolise that has 32G memory made it pass; having 4 coq processes exhaust 4G is plausible.

I'd go for NJOBS=1 (which will also make things more reproducible) but then I guess I should also increase the timeout a little.
And add a call to dmesg on failure if possible (is there a kwd for that?). Are you OK with this plan @Théo Zimmermann ?

view this post on Zulip Enrico Tassi (Sep 17 2020 at 11:03):

Maybe I should just add dmesg to validate the thesis.

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 11:04):

Yes, that sounds good.

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 11:04):

I believe that after_script is run in all cases, failure or not.

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 11:04):

Let me check.

view this post on Zulip Théo Zimmermann (Sep 17 2020 at 11:04):

Yes: https://docs.gitlab.com/ee/ci/yaml/#before_script-and-after_script

view this post on Zulip Enrico Tassi (Sep 17 2020 at 11:13):

I've opened a PR


Last updated: Oct 16 2021 at 09:07 UTC