I'm seeing errors like this in json-data
jobs for the opam archive:
File "nix/store/2w9yyr6c3py70kwsv8zvvx5afsfx3zx3-dune-2.8.5/bin/dune", line 1, characters 0-0:
Error: Invalid dune file
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1
These look worrisome, e.g., https://gitlab.com/coq/opam-coq-archive/-/jobs/1886975493
Anyone know what's going on? I think I've seen the "Invalid dune file" before, but it was a while ago.
It happens when Dune mistakes its own executable for a Dune file. I don't remember what was the cause of this Dune bug.
CC @Pierre-Yves Strub
We never found the cause, it only happens on pyrolise
Oh no, not it also happens in shared runners...
yes, now it consistently happens in every pull request for the json-data
job
Pyrolyse was updated to Ubuntu 20.04 then never turn into a GitLab runner again AFAIK because I couldn't figure out how to do it.
yeah it would be very nice to have a consistent set of runners, all with time limits of 10h or something
I have asked Jean-Claude Soret in September to give @Guillaume Melquiond access to Pyrolyse so that he could look into this, but I don't think he did (at least he didn't answer).
the way CI testing is done in the archive is very time-consuming, which is fine by me but we need higher time limits
Another solution would be to subscribe to some cloud provider to set up multiple servers for the opam archive.
that sounds good to me as well...
I have no idea how much it would cost but if it should be possible to allocate hundreds to a few thousands euros to this kind of things.
machines are all well and good, but it would be nice with some admin resources as well, e.g., so I don't have to persuade someone to take time away from research to fix an archive CI issue
Do these jobs run in Docker images? If yes, we could use a CaaS solution I guess.
right now they don't run in Docker to my knowledge...
this is the GitLab definition of the job that now consistently fails with the dune error:
json-data:
image: nixos/nix
cache: {}
before_script: []
script:
- nix-shell --run "dune exec --profile=release -- archive2web released extra-dev > coq-packages.json"
artifacts:
name: "$CI_JOB_NAME"
paths:
- coq-packages.json
expire_in: 1 year
could it be due to (some change in) Nix?
Note that, by virtue of image: nixos/nix
, they do run in Docker containers.
OK unless anyone has ideas I'm going to remove the jason-data
job tomorrow Thursday, since it slows down the reviewing and merging process a lot (have to check what failed every time)
If you remove the json-data
job, we have to consider removing the webpage https://coq.inria.fr/opam/www/ , because the latter dynamically loads the data generated by the job.
but right now, the job fails and we still get coq-packages.json
generated fine
I thought the job is just to generate an artifact for the PR, and the real generation occurs when one merges to master
?
Only the artifact of master
is used. But if you remove the job, no artifact will be generated any longer.
It is imperative that we fix the job, presumably by removing any use of Dune.
looks like the failure is in "post installation", what the heck?
post-installation fixup
moving /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/doc to /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/share/doc
shrinking RPATHs of ELF executables and libraries in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412
shrinking /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/bin/ocamlmerlin
shrinking /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/bin/ocamlmerlin-server
strip is /nix/store/5ddb4j8z84p6sjphr0kh6cbq5jd12ncs-binutils-2.35.1/bin/strip
stripping (with command strip and flags -S) in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/lib /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/bin
patching script interpreter paths in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412
checking for references to /tmp/nix-build-ocaml4.12.0-merlin-4.1-412.drv-0/ in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412...
Info: Creating file dune-project with this contents:
| (lang dune 2.8)
File "nix/store/2w9yyr6c3py70kwsv8zvvx5afsfx3zx3-dune-2.8.5/bin/dune", line 1, characters 0-0:
Error: Invalid dune file
Removing the merlin
line from default.nix
might be sufficient then.
(And also ocp-indent
, because why not.)
Yeah this error is due to dune being called not in the right directory
Actually I think dune 2.9.x has some more safeguards to prevent this problem
That's a tricky problem, related to composition, in the sense of when in an arbitrary point of a composed tree, how do you locate the root
(explicitly detecting binaries would give a better error message)
also making composition an option...
Rules have changed in 3.0 so this error doesn't happen, but instead, you will get an error "cannot detect workspace" so the problem is still there.
If this is running in nix, isn't this a release build? In which case, dune doesn't try to guess the root and defaults to --root .
(set by -p
). Or am I misunderstanding the problem?
I think you are misunderstanding. Dune is called explicitly inside a nix-shell
. See the discussion on the related issue: https://github.com/coq/opam-coq-archive/issues/1995
Last updated: Dec 07 2023 at 17:01 UTC