after we switched the opam archive PR to non-Nix CI for producing JSON files, I'm getting a ton of these seemingly cache-based errors:
$ chmod +x /usr/local/bin/opam
$ set -o pipefail
$ . scripts/opam-coq-setup-root # collapsed multi-line command
.Initializing opam root
gzip: stdin: decompression OK, trailing garbage ignored
tar: Child returned status 2
tar: Error is not recoverable: exiting now
Cleaning up project directory and file based variables
ERROR: Job failed: exit code 1
For example: https://gitlab.com/coq/opam-coq-archive/-/jobs/1896046372
@Enrico Tassi do you know what's going on? Can we somehow flush all caches and see if it helps?
try putting set -x
in the multiline command so you can see what it's doing
OK, let's be clear. There is in my view a forced choice between debugging the archive CI and reviewing PRs with packages. I don't have cycles for both. Maybe I'll try the set -x
thing in a week or so, if nobody else does.
Clearly, the cache is corrupted. Presumably by the line opam update default >> $1
which should have been opam update default >> $LOG
.
Sorry, I saw this bug yesterday
the problem is that 2 jobs use the same cache file
it is easy to fix, I'll do that after lunch
maybe @Gaëtan Gilbert can help, I'd like a variable containing the job name to be used here:
https://github.com/coq/opam-coq-archive/blob/e95b4f39b372ef59cfed71ac8410965b782fb1fc/.gitlab-ci.yml#L18
that shall fix the problem
I'm sorry but I saw the problem late yesterday night and I forgot
The problem is not that two jobs are using the cache. The problem is that your script is corrupting it.
It is corrupting the gzip file, since 2 jobs may regenerate the same cache file
the new job runs on the same compiler of another job
and the cache file name is only after that, not the job name itself :-/
I did not see it coming because I was only re-running the new job.
But now it runs in parallel with the other jobs
No! Your script is doing echo crap >> $CACHE
!
Having a single cache for several jobs is perfectly fine.
oops, where?
I though it was the upload from the worker to the cache (of the same file at the same time) resulting in a corruption
oh shit, I see
opam update default >> $1
then the fix is easy, let me do it now
By the way, why did you rename the job? Presumably the artifact will not be found anymore; will it?
no, I think I named the artifact correctly
Sure, but the artifact is found through the job name, isn't it?
boh, I don't know how the deployment works. I'll trust you blindly and fix it
There you are: https://github.com/coq/opam-coq-archive/pull/1997 now I really need to get to eat
I don't know either. But if I had done it, I would have used https://gitlab.com/coq/opam-coq-archive/-/jobs/artifacts/master/download?job=json-data which clearly depends on the job name.
The artifact itself has a name... anyway, I did revert that change
Sure, but again, artifacts are per job. You need to provide a job name to access an artifact, whatever its name.
Last updated: Sep 15 2024 at 13:02 UTC