I've seen multiple failures of jobs running on pyrolyse because of git errors. Could it be due to multiple jobs running at the same time and a lack of proper compartmentalization? How do we work around this issue?
https://gitlab.com/coq/coq/-/jobs?scope=finished
I have seen that too, no idea.
I guess the runner backend is not the best one?
On the most recent jobs, it seems that the majority of the jobs sent to pyrolyse are failing for this reason.
Roquableu doesn't seem to hit this issue. Maybe I should update the GitLab runner on pyrolyse?
It might not be even worth keeping pyrolyse active until we fix that.
Do they differ on their choice of runner backend?
If the problem is the runner backend should be easy to solve
[famous last words]
By runner backend do you mean the GitLab runner software or do you mean something else?
They were not installed at the same time (pyrolyse setup is older) so they could differ in particular in the version used.
I mean an option on the runner config
let me grab the exact name
Ok, I mean the "executor"
https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-executors
Log for a failed pyrolyse job says "Using Docker executor"
And same on roquableu
pyrolise has also some other funky setup which makes dune find its binary as a dune file and error because it's binary and not text, it happened many times in the opam coq archive.
(so using proper system images could help there too)
Enrico Tassi said:
(so using proper system images could help there too)
What do you mean by this? And do you understand what's going on? I'm willing to intervene on pyrolyse but I need to understand what the issues are and how to solve them.
Oh, if that is not the executor then I need to read more, thanks for checking @Théo Zimmermann
So FWIW, pyrolyse is running gitlab-runner 11.2.0, roquableu is running gitlab-runner 11.9.2 and the shared runners are gitlab-runner 13.11.0-rc1. We can imagine that updating the gitlab runner on pyrolyse to a more recent version should help.
Oh actually, this is the first time I see this, but it also failed on roquableu with the same error: https://gitlab.com/coq/coq/-/jobs/1219206323
Pyrolyse will soon be upgraded to a newer system (from Ubuntu 16.04 currently) and then I will reinstall the latest version of the gitlab-runner. Expect some downtimes.
Btw do we have some IaC to manage our runners?
what's iac?
infra as code
for this kind of upgrade, typically, you'd perform the OS upgrade and relaunch an Ansible playbook or something like that
AFAIK we already have 6+ runners, so it would probably be already interesting to automate the configuration
it would also help have a uniform gitlab runner version, etc
No we don't, and yes, that would be great to have if someone wants to invest time in learning to use some tooling for this kind of thing.
I could give it a try at some point, sounds reasonably easy
FTR, pyrolyse was upgraded but for some reason I don't have access to it yet. According to some Puppet config, @Maxime Dénès and @Pierre-Marie Pédrot do (if ever you want to take care of reinstalling the GitLab runner). Otherwise, I will take care of that next week when my access to the machine is fixed.
Even on Ubuntu 20.04, the apt package for gitlab-runner is stuck at 11.2.0. I'll try another installation method.
I've installed the latest version. But I can't seem to make it connect to GitLab. If someone with access wants to look into it, you are welcome.
Last updated: Sep 15 2024 at 13:02 UTC