Stream: Coq devs & plugin devs

Topic: runners


view this post on Zulip Gaëtan Gilbert (Apr 05 2022 at 11:02):

I'm going to remove the coq-windows runners from gitlab as

view this post on Zulip Gaëtan Gilbert (Apr 05 2022 at 11:03):

done

view this post on Zulip Gaëtan Gilbert (Jun 13 2023 at 13:07):

I paused coq-docker-machine-driver as we're getting tons of "runner system failure" on it
eg https://gitlab.com/coq/coq/-/jobs/4463423533

view this post on Zulip Maxime Dénès (Jun 13 2023 at 13:27):

Argh

view this post on Zulip Maxime Dénès (Jun 13 2023 at 15:58):

it could be a gitlab-runner regression actually, they made a bug fix release a few days ago which seems related

view this post on Zulip Théo Zimmermann (Jun 20 2023 at 10:52):

I'm getting a lot of space related issues on runners. This makes the job of preparing backports for Coq 8.17.1 difficult.

view this post on Zulip Théo Zimmermann (Jun 21 2023 at 09:31):

It looks like it happens specifically with ci-coq-01-runner-03. Should we disable this runner?

view this post on Zulip Maxime Dénès (Jun 22 2023 at 08:49):

You can disable it. If needed, I think I can still add more ephemeral runners.

view this post on Zulip Théo Zimmermann (Jun 22 2023 at 09:07):

I wanted to do so and then discovered that it exists twice:

Is it equivalent if I disable it in one place or the other?

view this post on Zulip Maxime Dénès (Jun 22 2023 at 09:39):

are you sure it is the same?

view this post on Zulip Théo Zimmermann (Jun 22 2023 at 09:39):

No, but it has the same name, and I cannot know which one it was that keeps failing.

view this post on Zulip Maxime Dénès (Jun 22 2023 at 09:39):

Are you sure it has the same name?

view this post on Zulip Maxime Dénès (Jun 22 2023 at 09:40):

I have limited internet access, but I see only ci-coq-03 on the repo (vs ci-coq-01-runner-03 in the group).

view this post on Zulip Théo Zimmermann (Jun 22 2023 at 09:40):

Oh indeed, you are correct.

view this post on Zulip Théo Zimmermann (Jun 22 2023 at 09:40):

So that's the latter that I should disable (the one in the group).

view this post on Zulip Maxime Dénès (Jun 22 2023 at 09:40):

if you want the background behind these namings: we have 4 physical machines ci-coq-01, ..., ci-coq-04

view this post on Zulip Maxime Dénès (Jun 22 2023 at 09:41):

They are meant to be used for the bench, but since the lack of runners issue, we turned ci-coq-01 and ci-coq-02 into regular runners, with VMs on top

view this post on Zulip Théo Zimmermann (Jul 01 2023 at 18:06):

I have disabled the coq-docker-machine-driver runner after seeing many systematic runner failures with this runner today.

view this post on Zulip Théo Zimmermann (Jul 01 2023 at 18:06):

The errors are always the same:

ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Preparation failed: exit status 1
Will be retried in 3s ...
ERROR: Job failed (system failure): exit status 1

view this post on Zulip Théo Zimmermann (Jul 01 2023 at 18:12):

I have also retried lots of pipelines that had many jobs that had failed for this reason.

view this post on Zulip Théo Zimmermann (Jul 01 2023 at 18:26):

I also had to turn off the test-docker-machine-driver for the same reason.

view this post on Zulip Théo Zimmermann (Jul 03 2023 at 10:19):

And FTR this means that jobs modifying the Docker image will be stuck and will require temporarily activating shared runners for them to be unstuck.

view this post on Zulip Théo Zimmermann (Jul 03 2023 at 13:14):

I have also disabled coq-01-runner-04, which kept failing with no space available on device.

view this post on Zulip Gaëtan Gilbert (Jul 03 2023 at 13:39):

we have only 2 active non bench runners

view this post on Zulip Gaëtan Gilbert (Jul 03 2023 at 14:08):

@Maxime Dénès are you on holidays?

view this post on Zulip Maxime Dénès (Jul 03 2023 at 14:35):

I'm not, I'll have a look

view this post on Zulip Maxime Dénès (Jul 03 2023 at 15:39):

Ok, I restored the Cloud-based runners. The root cause seems to be some CloudStack API timeouts on Friday. I'm trying to understand better.

view this post on Zulip Théo Zimmermann (Jul 03 2023 at 17:48):

Could be related to the GitLab maintenance?

view this post on Zulip Maxime Dénès (Jul 04 2023 at 08:45):

I don't think so, but I requested more info from the team in charge (in particular their logs around the event)


Last updated: Dec 07 2023 at 06:38 UTC