Stream: Coq devs & plugin devs

Topic: Migration to Inria GitLab for CI (instead of GitLab.com)


view this post on Zulip Théo Zimmermann (Sep 01 2023 at 11:03):

Hi all,
Since Wednesday, we (@Maxime Dénès and I) have switched the GitLab CI of the Coq repository to use Inria GitLab, which allows us to have better managed and more reliable custom runners. There may be some rough edges in the beginning, so please report them here if that's the case.

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 11:06):

@Erik Martin-Dorel I guess we should duplicate the webhook for Docker-Coq that was present on the Coq GitLab.com repo, correct?

view this post on Zulip Guillaume Melquiond (Sep 01 2023 at 11:18):

What about the currently pending pull requests? Will they 1. fail, 2. use the old CI, or 3. use the new CI?

view this post on Zulip Guillaume Melquiond (Sep 01 2023 at 11:20):

Actually, one of those instantly restarted when I took a look at it. Not sure if it is a pure coincidence, of if just looking at them is enough.

view this post on Zulip Pierre-Marie Pédrot (Sep 01 2023 at 11:21):

It's not magic, I've just restarted two runs to double-check...

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 11:25):

Guillaume Melquiond said:

What about the currently pending pull requests? Will they 1. fail, 2. use the old CI, or 3. use the new CI?

Any pipeline that was started before the switch will run on the old CI and results will still be reported. Any pipeline started after the switch (including using coqbot commands) should normally run on the new CI, including on old PRs.

view this post on Zulip Guillaume Melquiond (Sep 01 2023 at 11:26):

Actually, it seems to be running on the new CI, unless Pierre-Marie did something special when restarting it.

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 11:26):

What did you do to restart the pipelines @Pierre-Marie Pédrot?

view this post on Zulip Guillaume Melquiond (Sep 01 2023 at 11:26):

(deleted)

view this post on Zulip Pierre-Marie Pédrot (Sep 01 2023 at 11:29):

I just clicked on "retry failed jobs" in the interface

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 11:30):

Hum, OK. For this feature, I would have expected it to re-start the jobs on the old CI, but maybe I'm unclear on how this particular feature works...

view this post on Zulip Pierre-Marie Pédrot (Sep 01 2023 at 11:30):

these pipelines had already run on the new CI, they failed because of some weird transient error

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 11:30):

Ah OK, that explains it.

view this post on Zulip Pierre Roux (Sep 01 2023 at 11:51):

BTW, who should we ask to get rights to manually restart jobs?

view this post on Zulip Maxime Dénès (Sep 01 2023 at 11:55):

What do you need for that? A gitlab account + some membership level in the Coq organization?

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 12:05):

Yes, but note that you don't need that to restart failed jobs, as this is already doable from the GitHub interface.

view this post on Zulip Maxime Dénès (Sep 01 2023 at 12:06):

@Pierre Roux Do you have a specific need?

view this post on Zulip Pierre Roux (Sep 01 2023 at 12:08):

Indeed, that's probably enough for most cases. No specific need but sometimes you just want to restart a specific job (for instance after having updated an overlay).

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 12:26):

Indeed. We should probably extend the coqbot command language to give every regular Coq contributor the opportunity to perform these operations without needing an Inria GitLab account.

view this post on Zulip Michael Soegtrop (Sep 01 2023 at 12:41):

@Théo Zimmermann : would it help Coq Platform CI to do a similar move?

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 12:42):

For sure, we could use it for the Coq Platform CI as well, but this means rewriting the GitHub workflows into GitLab CI pipelines.

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 12:42):

Note that we can also do a partial move (e.g., start by only moving Linux CI, especially as I don't know whether having Windows / Mac runners would be easy on Inria's GitLab).

view this post on Zulip Erik Martin-Dorel (Sep 01 2023 at 14:26):

Théo Zimmermann said:

Erik Martin-Dorel I guess we should duplicate the webhook for Docker-Coq that was present on the Coq GitLab.com repo, correct?

I don't think so, the webhook you may be speaking about was useful to trigger a manual CI pipeline in an external repo (i.e., so that each push in coq.master triggers a new pipeline in docker-coq, which is not a mirror of coq, but a sibling repo)

So, this is not needed.

view this post on Zulip Erik Martin-Dorel (Sep 01 2023 at 14:30):

Pierre-Marie Pédrot said:

these pipelines had already run on the new CI, they failed because of some weird transient error

@Théo Zimmermann This makes me think that after this migration, this feature wish may be important to be addressed:

https://github.com/coq/bot/issues/289

just to summarize the context: I noticed in gitlab.inria.fr/math-comp/docker-mathcomp that when there are pipeline with many jobs (here, 49) and each job requires a docker pull from Docker Hub, the probability of a spurious network failure is high, and having coqbot retry only 3 times automatically and that's all, can be a major usability issue.

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 14:33):

Erik Martin-Dorel said:

Théo Zimmermann said:

Erik Martin-Dorel I guess we should duplicate the webhook for Docker-Coq that was present on the Coq GitLab.com repo, correct?

I don't think so, the webhook you may be speaking about was useful to trigger a manual CI pipeline in an external repo (i.e., so that each push in coq.master triggers a new pipeline in docker-coq, which is not a mirror of coq, but a sibling repo)

So, this is not needed.

Yes, but we have stopped the mirroring of Coq to GitLab.com, so the Docker-Coq repository will not receive any new webhooks when the Coq repo is updated.

view this post on Zulip Erik Martin-Dorel (Sep 01 2023 at 14:34):

Ah yes! You're right then, the migration of the webhook is needed as well.

Let me know if I have to check the pipelines logs tonight, after you setup this one.

view this post on Zulip Guillaume Melquiond (Sep 01 2023 at 15:10):

Erik Martin-Dorel said:

https://github.com/coq/bot/issues/289

just to summarize the context: I noticed in gitlab.inria.fr/math-comp/docker-mathcomp that when there are pipeline with many jobs (here, 49) and each job requires a docker pull from Docker Hub, the probability of a spurious network failure is high, and having coqbot retry only 3 times automatically and that's all, can be a major usability issue.

By curiosity, I went and looked at the issue. It is great that you have exponential backoff, but you are missing an important piece. You need to make the sleep length random. For example, if your current backoff is 25 seconds, you should not sleep for 25 seconds; you should sleep between 25 and 25*2=50 seconds. Otherwise, your 49 failing jobs will all wake up at the exact same time, and thus they will fail again.

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 15:17):

Thanks for the comment. I think it deserves to be copied over there ;-)

view this post on Zulip Michael Soegtrop (Sep 01 2023 at 16:50):

@Théo Zimmermann : thanks - I will think about it, but I think I prefer a solution which supports all platforms.

view this post on Zulip Théo Zimmermann (Sep 01 2023 at 16:59):

Another solution is of course to plug in custom GitHub Actions runners to overcome the 6h limit.

view this post on Zulip Michael Soegtrop (Sep 01 2023 at 17:03):

Yes. I had a discussion with Romain Tetely (he works with Enrico) yesterday and he want to look into this (using INRA HW).

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 12:41):

Hi folks, could you maybe update the rest of the CI team on the status and plan for this migration?

In particular I have a few comments:

view this post on Zulip Théo Zimmermann (Sep 21 2023 at 12:52):

in the previous attempt to migrate to inria.gitlab.fr we found problems with the limit in artifact size, was this solved?

I'm not sure what was the limit previously, but it doesn't seem to have hit us this time.

should we take advantage of the extra tags availabe in Inria CI ? I was able to use those to great benefit actually

That's irrelevant to us AFAICT because we are not using the shared runners but another infrastructure that allows us to get pre-configured custom runners. I hope that @Maxime Dénès will be able to do a write-up with links to relevant Inria doc, so that it is easy for others to know what's going on and how to maintain it. AFAIU, the shared runners were too few for the large usage we would have made of it.

what other steps you had to do to adapt our current setup?

Apart from the preliminary tests of Maxime and his interaction with the Inria SED team behind the custom runners infrastructure, not much, because coqbot had already been adapted previously (with Erik) to support pushing to alternative GitLab instances.

is documentation updated? Does the new setup require extra documentation? I see that we our CI docs still mention gitlab.com everywhere

It is not. It has been on my to-do to update it for a few weeks already.

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 12:54):

I see, I was not aware of the custom runner setup, thanks for the update!

Let me know if you need help with the docs update, we could maybe take this chance to rework a bit the CI docs together like for an hour if you want, and do the rests of TODOs async

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 12:56):

What are the specs of the custom runners? The large shared runners were pretty nice, so maybe we could selectively use those, but we need to document what Inria SED said about it I guess.

view this post on Zulip Théo Zimmermann (Sep 21 2023 at 12:58):

From what I've understood, Inria SED said to not use the shared runners for Coq, because the load would be too big. I don't know the spec of the custom runners. I hope they are documented somewhere or will be soon. Maybe we can tag @Thierry Martinez, who probably knows everything about this.

view this post on Zulip Thierry Martinez (Sep 21 2023 at 13:18):

The documentation for the custom runner is here: https://gitlab.inria.fr/inria-ci/custom-runner/ . It's still in beta: you are welcome to fill new issues (using the same link) if there are unexpected failures. The Coq CI already helped a lot to improve the implementation of the runners.

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 13:20):

Thanks @Thierry Martinez , oh so the custom runners we are using now, are really managed too, right?

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 13:21):

The setup looks pretty nice!

view this post on Zulip Théo Zimmermann (Sep 21 2023 at 13:24):

Thanks to this doc, I realize that there is macOS and Windows support too, so this means that we could migrate our GitHub workflows to use this if we want to. And it could help for the Coq Platform as well.

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 13:24):

Yes and we get the large runners too, that can really reduce the critical path

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 13:25):

for example if we implement https://github.com/coq/coq/issues/18050 , a large runner can make the compilation of all plugins pretty quick (one min or so I'd predict)

view this post on Zulip Michael Soegtrop (Sep 21 2023 at 14:00):

I wonder what I should do with Coq Platform CI. Should I migrate it to gitlab, split it into many small projects ...?

view this post on Zulip Théo Zimmermann (Sep 21 2023 at 14:04):

Unless you use some advanced GitHub Actions that would be difficult to replace in a GitLab workflow, migrating to GitLab (Inria) could indeed be a reasonable option. Something to discuss with Romain, I suppose!

view this post on Zulip Michael Soegtrop (Sep 21 2023 at 14:05):

I have a meeting with @Romain Tetley tomorrow and will address this. We need to come back to build full/extended installers in CI.

view this post on Zulip Gaëtan Gilbert (Sep 21 2023 at 14:33):

BTW looks like we're already using about as much artifact space on inria as on gitlab.com
https://gitlab.inria.fr/coq/coq/-/usage_quotas#storage-quota-tab
https://gitlab.com/groups/coq/-/usage_quotas#storage-quota-tab

view this post on Zulip Théo Zimmermann (Sep 21 2023 at 14:40):

That's huge and worrying!

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 15:25):

I guess artifact expiry didn't quick in yet? I wonder what the "stable" size is. 1 TiB of data is indeed a lot, on the other hand seems about right for our load, if that's properly cleaned up.

If that's a problem we could optimize quite a few things I think.

view this post on Zulip Gaëtan Gilbert (Sep 21 2023 at 15:35):

we had https://docs.gitlab.com/ee/administration/settings/continuous_integration.html#keep-the-latest-artifacts-for-all-jobs-in-the-latest-successful-pipelines checked (I'm unchecking now, it was already unchecked on .com)

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 15:41):

Nice @Gaëtan Gilbert , maybe we should start collecting these kind of tweaks in a document for the doc update?

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 15:42):

I tend to forget all these details very quickly

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 15:43):

Tho that setting should not have a large impact, right? The doc is a bit ambigous.

view this post on Zulip Gaëtan Gilbert (Sep 21 2023 at 15:51):

with it checked we keep artifacts for all the old open PRs
impact unclear tbh

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 16:07):

Oh yes, that is because we push the branches to gitlab

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 16:07):

I was thinking on the case of the main repos where we only have the official branches

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 16:07):

that's for sure quite a bit of impact, yes

view this post on Zulip Emilio Jesús Gallego Arias (Sep 21 2023 at 16:12):

Added some notes to the wiki https://github.com/coq/coq/wiki/State-of-the-continuous-integration-infrastructure#migration-to-gitlab-ci

So we can get back to them when updating the docs

view this post on Zulip Hugo Herbelin (Sep 26 2023 at 18:55):

I don't know if it is already known, gitlab.inria.fr will be unavailable for maintenance from Monday 16 Oct 1pm to, at latest, Tuesday 17 oct 9am (Paris time).

view this post on Zulip Théo Zimmermann (Sep 27 2023 at 07:58):

Do we want to switch back temporarily to GitLab.com during these planned maintenance downtimes?

view this post on Zulip Pierre Roux (Sep 27 2023 at 08:52):

If it's only half a day we can probably live with it (I mean, with fiat-crypto, that's just one or two CI cycles).

view this post on Zulip Théo Zimmermann (Sep 27 2023 at 13:07):

Note that if GitLab Inria is down, any PR updated during the meantime won't have any GitLab CI running at all. It will not just be delayed like with email.

view this post on Zulip Emilio Jesús Gallego Arias (Sep 27 2023 at 13:13):

I guess the downtime is small enough so we can retrigger the workflows manually.

view this post on Zulip Théo Zimmermann (Sep 27 2023 at 14:00):

Yep. Using @coqbot run full / light ci is enough to retrigger missing pipelines

view this post on Zulip Matthieu Sozeau (Sep 28 2023 at 08:42):

IIUC, we have now moved away from GitLab.com's CI so I don't need to renew the OSS program application, right? @Théo Zimmermann, @Maxime Dénès ?

view this post on Zulip Guillaume Melquiond (Sep 28 2023 at 08:58):

We are still using it for the Opam archive.

view this post on Zulip Matthieu Sozeau (Sep 28 2023 at 09:07):

Ah, ok

view this post on Zulip Théo Zimmermann (Sep 28 2023 at 09:10):

And I think that since the migration is very recent, it's safer to keep the option to go back to GitLab.com, at least for the coming year.

view this post on Zulip Théo Zimmermann (Sep 28 2023 at 09:11):

In one year, we should have gained enough experience, and possibly we will have migrated the opam archive to GitLab Inria as well. We'll see.

view this post on Zulip Karl Palmskog (Sep 28 2023 at 10:10):

hmm, one drawback of Inria GitLab is that non-Inria people will all need to maintain sponsored accounts, right? adds overhead to co-maintainerships

view this post on Zulip Théo Zimmermann (Sep 28 2023 at 11:04):

Yes if they need to interact with GitLab directly, but I think we can minimize the number of interactions needed. Currently, it is already possible to restart failed jobs from the GitHub interface. We should add support to allow running other common operations from GitHub.

view this post on Zulip Thierry Martinez (Sep 30 2023 at 11:02):

I unregistered the runner by mistake from GitLab: I registered it again, but as far as I know, I can't preserve it enable on the projects it was previously enabled, sorry for this. Can anyone with admins right on gitlab.inria.fr/coq/coq enable inria-ci/custom-runner again in https://gitlab.inria.fr/coq/coq/-/settings/ci_cd? (If it is not already the case, I have to add the person to gitlab.inria.fr/inria-ci/custom-runner first.) Thanks, and sorry again. Hopefully, such inconveniences will not happen when the custom runner will be a shared runner in GitLab sense (which should happen soon now).

view this post on Zulip Gaëtan Gilbert (Sep 30 2023 at 11:05):

I can do it if you add me

view this post on Zulip Gaëtan Gilbert (Oct 02 2023 at 13:24):

ping @Thierry Martinez
maybe @Maxime Dénès can act too?

CI is dead until we add the runners back

view this post on Zulip Maxime Dénès (Oct 02 2023 at 13:37):

I can't add somebody as I'm not owner in that org. I'll ask someone who is.

view this post on Zulip Maxime Dénès (Oct 02 2023 at 13:40):

Oh, in fact I can. @Gaëtan Gilbert should be ok now.

view this post on Zulip Gaëtan Gilbert (Oct 02 2023 at 13:41):

I have pushed the button

view this post on Zulip Karl Palmskog (Oct 02 2023 at 15:29):

Inria GitLab (https://gitlab.inria.fr/) seems to be down in general, is this planned?

view this post on Zulip Hugo Herbelin (Oct 02 2023 at 15:33):

Iirc, this was planned, but it was announced to be for just one hour.

view this post on Zulip Guillaume Melquiond (Oct 02 2023 at 15:59):

Actually, it was two hours: "interruption de service entre 17h et 19h".

view this post on Zulip Gaëtan Gilbert (Oct 03 2023 at 09:35):

getting more of these python errors https://gitlab.inria.fr/coq/coq/-/jobs/3466937

view this post on Zulip Hugo Herbelin (Oct 03 2023 at 10:10):

getting more of these python errors

Can it be related to an empty list of vm?


Last updated: Nov 29 2023 at 22:01 UTC