Stream: Coq devs & plugin devs

Topic: opam archive CI cache errors


view this post on Zulip Karl Palmskog (Dec 17 2021 at 09:33):

after we switched the opam archive PR to non-Nix CI for producing JSON files, I'm getting a ton of these seemingly cache-based errors:

$ chmod +x /usr/local/bin/opam
$ set -o pipefail
$ . scripts/opam-coq-setup-root # collapsed multi-line command
.Initializing opam root

gzip: stdin: decompression OK, trailing garbage ignored
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1

For example: https://gitlab.com/coq/opam-coq-archive/-/jobs/1896046372

@Enrico Tassi do you know what's going on? Can we somehow flush all caches and see if it helps?

view this post on Zulip Gaëtan Gilbert (Dec 17 2021 at 09:54):

try putting set -x in the multiline command so you can see what it's doing

view this post on Zulip Karl Palmskog (Dec 17 2021 at 09:59):

OK, let's be clear. There is in my view a forced choice between debugging the archive CI and reviewing PRs with packages. I don't have cycles for both. Maybe I'll try the set -x thing in a week or so, if nobody else does.

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 10:52):

Clearly, the cache is corrupted. Presumably by the line opam update default >> $1 which should have been opam update default >> $LOG.

view this post on Zulip Enrico Tassi (Dec 17 2021 at 10:57):

Sorry, I saw this bug yesterday

view this post on Zulip Enrico Tassi (Dec 17 2021 at 10:58):

the problem is that 2 jobs use the same cache file

view this post on Zulip Enrico Tassi (Dec 17 2021 at 10:58):

it is easy to fix, I'll do that after lunch

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:00):

maybe @Gaëtan Gilbert can help, I'd like a variable containing the job name to be used here:
https://github.com/coq/opam-coq-archive/blob/e95b4f39b372ef59cfed71ac8410965b782fb1fc/.gitlab-ci.yml#L18

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:01):

that shall fix the problem

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:01):

I'm sorry but I saw the problem late yesterday night and I forgot

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:03):

The problem is not that two jobs are using the cache. The problem is that your script is corrupting it.

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:04):

It is corrupting the gzip file, since 2 jobs may regenerate the same cache file

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:04):

the new job runs on the same compiler of another job

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:04):

and the cache file name is only after that, not the job name itself :-/

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:05):

I did not see it coming because I was only re-running the new job.
But now it runs in parallel with the other jobs

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:05):

No! Your script is doing echo crap >> $CACHE!

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:05):

Having a single cache for several jobs is perfectly fine.

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:06):

oops, where?
I though it was the upload from the worker to the cache (of the same file at the same time) resulting in a corruption

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:07):

oh shit, I see

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:07):

opam update default >> $1

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:07):

then the fix is easy, let me do it now

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:07):

By the way, why did you rename the job? Presumably the artifact will not be found anymore; will it?

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:08):

no, I think I named the artifact correctly

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:09):

Sure, but the artifact is found through the job name, isn't it?

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:10):

boh, I don't know how the deployment works. I'll trust you blindly and fix it

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:11):

There you are: https://github.com/coq/opam-coq-archive/pull/1997 now I really need to get to eat

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:13):

I don't know either. But if I had done it, I would have used https://gitlab.com/coq/opam-coq-archive/-/jobs/artifacts/master/download?job=json-data which clearly depends on the job name.

view this post on Zulip Enrico Tassi (Dec 17 2021 at 11:14):

The artifact itself has a name... anyway, I did revert that change

view this post on Zulip Guillaume Melquiond (Dec 17 2021 at 11:18):

Sure, but again, artifacts are per job. You need to provide a job name to access an artifact, whatever its name.


Last updated: Sep 15 2024 at 13:02 UTC