https://gitlab.com/coq/coq/-/jobs/1842886515 and several other jobs
cc @Jason Gross
The error message is
COQNATIVE /builds/coq/coq/_build_ci/fiat_crypto/rupicola/bedrock2/bedrock2/src/bedrock2Examples/lightbulb.vo
ocamlopt.opt got signal and exited
Error: Native compiler exited with status 2 (in case of stack overflow,
increasing stack size (typicaly with "ulimit -s") often helps)
I guess the fix is to increase the stack size? (Native compilation works fine on our CI, I believe.) Once https://github.com/mit-plv/bedrock2/issues/205 is resolved, we'll stop depending on this file, and then this won't be an issue (but I'm not sure how long that'll take)
It seems that fiat-crypto is broken on master, see e.g. https://gitlab.com/coq/coq/-/jobs/2356260838
It dies with
File "src/ExtractionOCaml/saturated_solinas.ml", line 1:
Error: The file /root/.opamcache/4.13.0+flambda/lib/ocaml/unix.cmxa is not a compilation unit description
Command exited with non-zero status 2
any idea about what's going on? (cc @Jason Gross )
We invoke ocamlfind ocamlopt -package unix -linkpkg -w -20 -g -o
because we need Unix.gettimeofday
. This sounds like the opam distribution is broken?
This error message can only occur if the OCaml magic number is incorrect. So, three possibilities:
We have
library:ci-fiat_crypto:
extends: .ci-template-flambda
needs:
- build:edge+flambda
- library:ci-coqprime
- plugin:ci-bignums
- plugin:ci-rewriter
# We cannot use flambda due to
# https://github.com/ocaml/ocaml/issues/7842, see
# https://github.com/coq/coq/pull/11916#issuecomment-609977375
library:ci-fiat_crypto_ocaml:
extends: .ci-template
needs:
- build:edge+flambda
- library:ci-coqprime
- plugin:ci-bignums
- plugin:ci-rewriter
- library:ci-fiat_crypto
So it does seem that somehow we're crossing flambda vs not?
(The ci-fiat_crypto_ocaml is just invoking ocaml on some ml files that were produced by the fiat crypto step, so what should happen is that we do all the Coq stuff on edge+flambda, upload the artifacts, and then in library:ci-fiat_crypto_ocaml, set up the system to have the fiat-crypto generated files around, and also to have the non-flambda edge ocaml being the default/installed one.)
Indeed, the fact that one extends ci-template-flambda
while the other extends ci-template
seems a bit dubious. (But I am not sure it suffices to explain the issue.)
So, yes, it suffices to explain the issue. The job is run inside the 4.05.0
switch, but it uses the libraries from the 4.13.0+flambda
switch.
More precisely, Fiat Crypto asks Coq for OCAMLFIND
. So, Coq answers with the one from the 4.13.0+flambda
switch, which is obviously incompatible with the compiler from the current switch, i.e., 4.05.0
.
Ah, i guess that's a change I made recently. I'll go back to using the one in path.
(deleted)
https://github.com/coq/coq/pull/15940
@Jason Gross lightbulb is back, eg https://gitlab.com/coq/coq/-/jobs/3117535414
is running native for fiat-crypto really useful? I propose we disable it and save some runner time
The setup mixing the two compiler switches is actually not supported, what we have now is pretty brittle IMHO
fiat-crypto explicitly invokes native_compute in some places where it is much faster (2x? 10x? I don't quite recall) than vm_compute. I've opened https://github.com/mit-plv/bedrock2/issues/282, I think the solution we want is to build bedrock2_ex with native compute on-demand. Can you remind me what has to be done to suppress coqnative from our side, only on bedrock2?
@Emilio Jesús Gallego Arias if you're referring to depending on different compilers for fiat_crypto and fiat_crypto_ocaml, that's not related to this issue.
Alternatively, making all of fiat-crypto be native-compute on demand seems fine
Interestingly it does work in Coq Platform, which does not enable native compute ...
doesn't native_compute just fall back on vm_compute or similar if native_compute is not enabled?
yes
recent example https://gitlab.com/coq/coq/-/jobs/3214365920
COQC src/Curves/Weierstrass/AffineProofs.v
Finished transaction in 0.812 secs (0.804u,0.007s) (successful)
Command terminated by signal 9
src/Curves/Weierstrass/AffineProofs.vo (real: 70.57, user: 2.35, sys: 0.33, mem: 1336288 ko)
Makefile.coq:792: recipe for target 'src/Curves/Weierstrass/AffineProofs.vo' failed
Why is it being mem-killed at 1336288 ko?
Maybe it's a problem with par:
, try setting COQEXTRAFLAGS="-async-proofs-j 1"
in dev/ci/ci-fiat_crypto.sh
?
Jason Gross said:
Why is it being mem-killed at 1336288 ko?
One of the heuristics of the OOM killer is that each child process contributes half its footprint toward the footprint of its parent process. So, if the main Coq process uses 1GB and there are three children of 2GB each, then the main process actually weights 4GB. So, it will be the first to die, even if it isn't the worst offender. (But, in a sense, it is the worst offender. It should never have created such large children in the first place. The kernel can be quite vindictive when memory becomes sparse.)
FTR I've recently seen a different type of failure in fiat-crypto:
File "./src/Curves/Weierstrass/AffineProofs.v", line 49, characters 4-13:
Error: (in proof commutative_group): Attempt to save an incomplete proof
Command exited with non-zero status 1
src/Curves/Weierstrass/AffineProofs.vo (real: 74.58, user: 128.66, sys: 1.81, mem: 1424652 ko)
(https://github.com/coq/coq/runs/9102466669)
@Théo Zimmermann That is another avatar of the same issue (but is possibly worth separately reporting as a bug in Coq?). When a par:
worker process is OOM-killed (sometimes? always?), without the main Coq process being OOM-killed, the main Coq process believes the worker completed successfully but did not solve any goals. Hence the Qed
fails with "Attempt to save an incomplete proof".
I'll report this
Reported at https://github.com/coq/coq/issues/16732
Last updated: Oct 13 2024 at 01:02 UTC