Stream: Coq devs & plugin devs

Topic: opam archive CI dune error


view this post on Zulip Karl Palmskog (Dec 15 2021 at 09:37):

I'm seeing errors like this in json-data jobs for the opam archive:

File "nix/store/2w9yyr6c3py70kwsv8zvvx5afsfx3zx3-dune-2.8.5/bin/dune", line 1, characters 0-0:
Error: Invalid dune file
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1

These look worrisome, e.g., https://gitlab.com/coq/opam-coq-archive/-/jobs/1886975493

Anyone know what's going on? I think I've seen the "Invalid dune file" before, but it was a while ago.

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 10:22):

It happens when Dune mistakes its own executable for a Dune file. I don't remember what was the cause of this Dune bug.

view this post on Zulip Enrico Tassi (Dec 15 2021 at 11:45):

CC @Pierre-Yves Strub

view this post on Zulip Enrico Tassi (Dec 15 2021 at 11:45):

We never found the cause, it only happens on pyrolise

view this post on Zulip Enrico Tassi (Dec 15 2021 at 11:46):

Oh no, not it also happens in shared runners...

view this post on Zulip Karl Palmskog (Dec 15 2021 at 11:52):

yes, now it consistently happens in every pull request for the json-data job

view this post on Zulip Théo Zimmermann (Dec 15 2021 at 11:58):

Pyrolyse was updated to Ubuntu 20.04 then never turn into a GitLab runner again AFAIK because I couldn't figure out how to do it.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 11:59):

yeah it would be very nice to have a consistent set of runners, all with time limits of 10h or something

view this post on Zulip Théo Zimmermann (Dec 15 2021 at 11:59):

I have asked Jean-Claude Soret in September to give @Guillaume Melquiond access to Pyrolyse so that he could look into this, but I don't think he did (at least he didn't answer).

view this post on Zulip Karl Palmskog (Dec 15 2021 at 12:00):

the way CI testing is done in the archive is very time-consuming, which is fine by me but we need higher time limits

view this post on Zulip Théo Zimmermann (Dec 15 2021 at 12:12):

Another solution would be to subscribe to some cloud provider to set up multiple servers for the opam archive.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 12:13):

that sounds good to me as well...

view this post on Zulip Théo Zimmermann (Dec 15 2021 at 12:13):

I have no idea how much it would cost but if it should be possible to allocate hundreds to a few thousands euros to this kind of things.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 12:15):

machines are all well and good, but it would be nice with some admin resources as well, e.g., so I don't have to persuade someone to take time away from research to fix an archive CI issue

view this post on Zulip Maxime Dénès (Dec 15 2021 at 12:26):

Do these jobs run in Docker images? If yes, we could use a CaaS solution I guess.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 12:27):

right now they don't run in Docker to my knowledge...

view this post on Zulip Karl Palmskog (Dec 15 2021 at 12:28):

this is the GitLab definition of the job that now consistently fails with the dune error:

json-data:
  image: nixos/nix
  cache: {}
  before_script: []
  script:
    - nix-shell --run "dune exec --profile=release -- archive2web released extra-dev > coq-packages.json"
  artifacts:
    name: "$CI_JOB_NAME"
    paths:
      - coq-packages.json
    expire_in: 1 year

could it be due to (some change in) Nix?

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 13:08):

Note that, by virtue of image: nixos/nix, they do run in Docker containers.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 16:56):

OK unless anyone has ideas I'm going to remove the jason-data job tomorrow Thursday, since it slows down the reviewing and merging process a lot (have to check what failed every time)

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 17:04):

If you remove the json-data job, we have to consider removing the webpage https://coq.inria.fr/opam/www/ , because the latter dynamically loads the data generated by the job.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 17:07):

but right now, the job fails and we still get coq-packages.json generated fine

view this post on Zulip Karl Palmskog (Dec 15 2021 at 17:08):

I thought the job is just to generate an artifact for the PR, and the real generation occurs when one merges to master?

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 17:10):

Only the artifact of master is used. But if you remove the job, no artifact will be generated any longer.

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 17:15):

It is imperative that we fix the job, presumably by removing any use of Dune.

view this post on Zulip Karl Palmskog (Dec 15 2021 at 17:17):

looks like the failure is in "post installation", what the heck?

post-installation fixup
moving /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/doc to /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/share/doc
shrinking RPATHs of ELF executables and libraries in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412
shrinking /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/bin/ocamlmerlin
shrinking /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/bin/ocamlmerlin-server
strip is /nix/store/5ddb4j8z84p6sjphr0kh6cbq5jd12ncs-binutils-2.35.1/bin/strip
stripping (with command strip and flags -S) in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/lib  /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412/bin
patching script interpreter paths in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412
checking for references to /tmp/nix-build-ocaml4.12.0-merlin-4.1-412.drv-0/ in /nix/store/20nf1b6ymxyx9ndmzdq89rjf0b4j75g8-ocaml4.12.0-merlin-4.1-412...
Info: Creating file dune-project with this contents:
| (lang dune 2.8)
File "nix/store/2w9yyr6c3py70kwsv8zvvx5afsfx3zx3-dune-2.8.5/bin/dune", line 1, characters 0-0:
Error: Invalid dune file

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 17:29):

Removing the merlin line from default.nix might be sufficient then.

view this post on Zulip Guillaume Melquiond (Dec 15 2021 at 17:30):

(And also ocp-indent, because why not.)

view this post on Zulip Emilio Jesús Gallego Arias (Dec 15 2021 at 23:32):

Yeah this error is due to dune being called not in the right directory

view this post on Zulip Emilio Jesús Gallego Arias (Dec 15 2021 at 23:32):

Actually I think dune 2.9.x has some more safeguards to prevent this problem

view this post on Zulip Emilio Jesús Gallego Arias (Dec 15 2021 at 23:32):

That's a tricky problem, related to composition, in the sense of when in an arbitrary point of a composed tree, how do you locate the root

view this post on Zulip Paolo Giarrusso (Dec 16 2021 at 02:08):

(explicitly detecting binaries would give a better error message)

view this post on Zulip Enrico Tassi (Dec 16 2021 at 06:49):

also making composition an option...

view this post on Zulip Emilio Jesús Gallego Arias (Dec 16 2021 at 13:51):

Rules have changed in 3.0 so this error doesn't happen, but instead, you will get an error "cannot detect workspace" so the problem is still there.

view this post on Zulip Rudi Grinberg (Dec 16 2021 at 18:49):

If this is running in nix, isn't this a release build? In which case, dune doesn't try to guess the root and defaults to --root . (set by -p). Or am I misunderstanding the problem?

view this post on Zulip Théo Zimmermann (Dec 16 2021 at 20:51):

I think you are misunderstanding. Dune is called explicitly inside a nix-shell. See the discussion on the related issue: https://github.com/coq/opam-coq-archive/issues/1995


Last updated: Feb 02 2023 at 15:04 UTC