Stream: Coq devs & plugin devs

Topic: CI failure in build:base


view this post on Zulip Guillaume Melquiond (Jun 18 2021 at 15:36):

What is going on with the build:base job? It keeps getting restarted; there already 10 instances of it in https://gitlab.com/coq/coq/-/jobs/1359488020

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 15:41):

Looks like it is timing out. I've cancelled the last running instance to avoid it being restarted again.

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 15:42):

Argh I misread.

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 15:42):

That was not the issue.

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 15:42):

The job is restarted because the last line of the log is ERROR: Job failed: exit code 137

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 15:42):

which usually denotes an issue with the runner.

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 15:43):

Anyway, manually cancelling is the thing to do in cases of looping restarts.

view this post on Zulip Guillaume Melquiond (Jun 18 2021 at 16:27):

Sure, but then what? Should I wait a few days and then restart the job?

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 16:37):

There is something fishy: several pipelines are experiencing the same recurring failure in the same base:build job. Could it be that we've introduced a recent change that has broken this job when run with a subclass of runners?

view this post on Zulip Théo Zimmermann (Jun 18 2021 at 16:44):

5 pipelines were affected so far (323379491, 323309344, 323449544, 323400481, 323379222) and all of them only in the last couple of hours. Unless this keeps happening again tomorrow, I guess we can consider that this was a transient spurious issue with GitLab shared runners.

view this post on Zulip Guillaume Melquiond (Jun 18 2021 at 16:48):

Given the last run command, could it be that we are hitting the command line limit?

view this post on Zulip Guillaume Melquiond (Jun 18 2021 at 16:56):

(At a quick glance, the size of this command line is a few bytes over 32KB.)

view this post on Zulip Jason Gross (Jun 20 2021 at 05:15):

I think this is also responsible for coqbot posting about minimization jobs every 15 minutes...

view this post on Zulip Guillaume Melquiond (Jun 21 2021 at 09:27):

This is not working any better after waiting a few days. I am still unable to go past the build:base job. And my theory about hitting the command limit is wrong, as the slaves have a 2MB limit.

view this post on Zulip Jason Gross (Jun 21 2021 at 11:16):

I've cancelled the build:base jobs currently running, since they seem to be starving out other queued jobs, at least a little bit

view this post on Zulip Gaëtan Gilbert (Jun 21 2021 at 11:40):

maybe we should disable the shared runners? has the error happened with non shared runners?

view this post on Zulip Gaëtan Gilbert (Jun 21 2021 at 11:58):

maybe https://github.com/coq/coq/pull/14533 will help


Last updated: Oct 21 2021 at 20:02 UTC