Stream: Coq devs & plugin devs

Topic: Too many pipelines


view this post on Zulip Gaëtan Gilbert (Aug 24 2020 at 09:21):

What should we do about the recurring too many pipeline jobs error?
The limit is documented here https://docs.gitlab.com/ee/user/gitlab_com/index.html#gitlab-cicd
"Max jobs in active pipelines gitlab.com:500 for Free tier, unlimited otherwise self-hosted:unlimited"

view this post on Zulip Emilio Jesús Gallego Arias (Aug 24 2020 at 09:25):

I understand we could apply for some open source program and get that limit lifted too?

view this post on Zulip Gaëtan Gilbert (Aug 24 2020 at 09:32):

If you mean https://about.gitlab.com/solutions/open-source/program/ then "All public projects on GitLab.com automatically receive Gold functionality at the project level. Apply to this program if you need Gold functionality at the group level."
job limits are project level

view this post on Zulip Emilio Jesús Gallego Arias (Aug 25 2020 at 09:35):

Ok, just an idea, maybe we could just write to them? Documentation on that was always a bit imprecise, I see they have reworded it a bit tho.

view this post on Zulip Gaëtan Gilbert (Aug 25 2020 at 17:20):

I made https://gitlab.com/gitlab-org/gitlab/-/issues/240895
it sounds like we should apply for the program

view this post on Zulip Emilio Jesús Gallego Arias (Aug 25 2020 at 17:24):

That was my understanding last time I looked.

view this post on Zulip Emilio Jesús Gallego Arias (Aug 25 2020 at 17:24):

That we needed to apply

view this post on Zulip Jason Gross (Aug 25 2020 at 18:19):

I suspect we probably don't want GitHub actions, as this will result in the CI taking 25x more real time to run (since we only get 20 concurrent jobs)

view this post on Zulip Gaëtan Gilbert (Aug 25 2020 at 18:40):

do we really get 500 jobs on gitlab? O_O

view this post on Zulip Théo Zimmermann (Aug 25 2020 at 19:01):

20 concurrent jobs is already pretty good. On GitLab, I feel like we frequently get that as well, but sometimes we also get a single free runner and most of our jobs are run by our own runners. If I'm not mistaken, GitHub also supports setting up your own runners.

view this post on Zulip Gaëtan Gilbert (Aug 27 2020 at 12:57):

https://forum.gitlab.com/t/not-treated-as-open-source-for-pipeline-job-limit/41938

view this post on Zulip Gaëtan Gilbert (Sep 01 2020 at 09:42):

Emilio Jesús Gallego Arias said:

That we needed to apply

ok shall we do it?
is there any blocker?

view this post on Zulip Emilio Jesús Gallego Arias (Sep 01 2020 at 11:22):

Not that I'm aware

view this post on Zulip Matthieu Sozeau (Sep 02 2020 at 15:44):

I applied, we'll see

view this post on Zulip Jason Gross (Sep 07 2020 at 02:22):

Note that they replied at https://gitlab.com/gitlab-org/gitlab/-/issues/240895#note_402263876:
"""
Ah apologies I didn't realise the project was public.
I'm going to ask around for you to best advise on the next steps.
As mentioned previously this isn't the best process to get the right answers for you. It certainly appears your project is on the free tier going by the behavior but I'm not able to say why this isn't the case sadly.
To ensure you're getting to speak to the right people to help with this further I would recommend either creating a thread about this on our support forum or to contact our Open Source team's email at opensource@gitlab.com. Hope this helps you get it sorted!
"""
So perhaps someone should email?

view this post on Zulip Gaëtan Gilbert (Sep 23 2020 at 13:11):

@Matthieu Sozeau any response to the application? we're still seeing the issue eg https://gitlab.com/coq/coq/-/pipelines/193464370

view this post on Zulip Matthieu Sozeau (Sep 23 2020 at 13:14):

Nope, not yet. They said "allow 10 days for a response" and I sent it last wednesday

view this post on Zulip Matthieu Sozeau (Sep 30 2020 at 08:00):

I just signed the papers to the Gold plan, so we should benefit from it anytime now

view this post on Zulip Théo Zimmermann (Oct 01 2020 at 07:15):

Does not seem to be the case yet: screenshot

view this post on Zulip Théo Zimmermann (Oct 05 2020 at 15:44):

Still the case today :frown:

view this post on Zulip Théo Zimmermann (Oct 05 2020 at 15:49):

And we still get the "job activity exceeded" failure: https://gitlab.com/coq/coq/-/pipelines/197875878

view this post on Zulip Théo Zimmermann (Oct 15 2020 at 14:33):

We're still on the free plan with recurrent "job activity exceeded" failures.

view this post on Zulip Théo Zimmermann (Oct 19 2020 at 12:57):

This is still an issue.

view this post on Zulip Gaëtan Gilbert (May 21 2021 at 16:50):

@Matthieu Sozeau can you ping gitlab about this?

view this post on Zulip Matthieu Sozeau (May 21 2021 at 16:51):

Yes, sure

view this post on Zulip Matthieu Sozeau (May 26 2021 at 09:52):

Done

view this post on Zulip Matthieu Sozeau (May 26 2021 at 09:52):

(I pinged them again)

view this post on Zulip Jason Gross (May 26 2021 at 21:05):

As a kludge, can coqbot restart pipelines that fail with this error?

view this post on Zulip Théo Zimmermann (May 27 2021 at 07:09):

The reason these pipelines fail is that we go over the limits for the Free plan (which should have been lifted if GitLab had really granted us the Gold OSS plan). We believe that the fact that restarting manually allows the pipeline to run despite this limitation is a GitLab bug and we are fortunate that this bug still exists. Therefore, we should avoid abusing it by systematically restarting the failed pipelines, which would put us well beyond the limits for the Free plan.

view this post on Zulip Théo Zimmermann (May 27 2021 at 07:11):

The plan to alleviate this problem was rather to reduce the number of jobs that are systematically run for each pipeline, with a dual mode, where, by default, we run a more lightweight pipeline, then upon request and before merging a PR, we run the full pipeline. This is not yet implemented because I had other priorities that interfered.

view this post on Zulip Emilio Jesús Gallego Arias (May 27 2021 at 09:23):

Indeed, @Théo Zimmermann gave a good summary, the current state of the job limit is a bit strange as even the direction / scope of the bug is not clear. Moreover, all CI systems are under attack due to crypto currency mining, so things may get quite worse for us, but fortunately our choice of using gitlab means we could provide a backup if needed.

view this post on Zulip Matthieu Sozeau (May 27 2021 at 10:36):

Don't we have new workers available for this actually?

view this post on Zulip Gaëtan Gilbert (May 27 2021 at 10:38):

this is not related to workers

view this post on Zulip Théo Zimmermann (May 27 2021 at 10:40):

Even if we had an infinite number of workers that we own, we would still need to switch to our own GitLab (e.g. Inria's GitLab) or get the Gold plan to avoid this limit on the number of concurrent jobs.

view this post on Zulip Matthieu Sozeau (May 27 2021 at 10:42):

I see

view this post on Zulip Matthieu Sozeau (Jun 02 2021 at 15:05):

Sadly, they seem to be overwhelmed... no response still after a week

view this post on Zulip Matthieu Sozeau (Jun 07 2021 at 18:15):

image.png Finally !

view this post on Zulip Théo Zimmermann (Jun 07 2021 at 18:31):

Wonderful! But given the actual start date, they could have adapted the end date :stuck_out_tongue:

view this post on Zulip Matthieu Sozeau (Jun 07 2021 at 18:50):

I guess they'll renew easily, as we're linked to the Inria account

view this post on Zulip Matthieu Sozeau (Jun 07 2021 at 18:54):

I can't seem to find a way to check that the pipeline limit has been unlifted, but we'll know soon enough :)

view this post on Zulip Théo Zimmermann (Jun 09 2021 at 10:47):

Indeed, the last known pipeline activity limit errors were on June 3rd.

view this post on Zulip Matthieu Sozeau (Jun 10 2021 at 14:05):

Hurray!

view this post on Zulip Gaëtan Gilbert (Sep 07 2021 at 10:35):

I'm getting a prompt to renew on https://gitlab.com/coq/coq/-/pipelines
Screenshot_20210907_123410.png
it needs customer credentials, @Matthieu Sozeau can you handle it?

view this post on Zulip Théo Zimmermann (Sep 07 2021 at 12:21):

In another thread, Matthieu answered that the renewal was supposed to be automatic (but that he would check with GitLab support).

view this post on Zulip Matthieu Sozeau (Sep 07 2021 at 13:22):

Done, I sent a renewal request, apparently they're in the process of automating this but it does not work yet for Gitlab for OSS projects

view this post on Zulip Matthieu Sozeau (Sep 18 2021 at 12:45):

The renewal was signed today, so we should get no interuptions

view this post on Zulip Théo Zimmermann (Sep 15 2022 at 11:14):

@Matthieu Sozeau You said you handled the renewal this year again, right? GitLab still shows "Your Opensource subscription for coq will expire on 2022-09-29. After it expires, you can't use merge approvals, code quality, or many other features."

view this post on Zulip Matthieu Sozeau (Sep 15 2022 at 11:15):

Yes, I signed the quote already. I’ll check with them

view this post on Zulip Théo Zimmermann (Sep 23 2022 at 11:11):

Have they replied to your inquiry? The message is still showing up and the expiration date is 2022-09-29, so in less than a week.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 14:46):

It's becoming urgent, I just got (and probably so you did) a gitlab email scaringly stating

coq namespace has 30% or less Shared Runner Pipeline minutes remaining. Once it runs out, no new jobs or pipelines in its projects will run.

view this post on Zulip Gaëtan Gilbert (Oct 03 2022 at 14:50):

I think they changed how they count CI usage
counted usage:
usage.png
actual time spent:
time.png
it used to be time = usage * 25 but this month it's time = usage * 2
so now we are running out

view this post on Zulip Gaëtan Gilbert (Oct 03 2022 at 14:51):

see also https://docs.gitlab.com/ee/ci/pipelines/cicd_minutes.html#cost-factor

0.5 for public projects in the GitLab for Open Source program.
1 for other public projects, after October 1, 2022 (previously 0.04).

view this post on Zulip Gaëtan Gilbert (Oct 03 2022 at 15:04):

we could try to have core devs open forks and run their pipelines there (forks get 125 factor * 2k free minutes instead of 2 factor * 50k free minutes)
the permission juggling could get complex though
we could buy more (we seem to be using about $50 / day) I don't know where the money would come from though
we could try running to gitlab.inria
we could try getting our own machines or vms somewhere

anyway at the current rate CI will break in a couple days

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:14):

cc @Maxime Dénès (you probably need to be aware, especially if the solution ends up being buying runners somewhere)

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:15):

I feel stabbed in the back by GitLab because they have not announced this change of factors for public projects widely enough that we noticed it (they could have send an email before it was almost too late!).

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:16):

Maybe it's time to consider implementing https://github.com/coq/bot/issues/159 urgently.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:18):

I propose to urgently disable most CI jobs by default (depending on the value of a variable).

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:18):

Then we can add the necessary support to coqbot to handle this variable automatically.

view this post on Zulip Maxime Dénès (Oct 03 2022 at 15:18):

Is migrating some of the jobs to GitHub Actions a potential solution for the short term?

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:24):

Not if we don't get some GitHub sponsorship IMO, otherwise our CI will be quickly throttled by the 4 parallel job limit.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:31):

According to the new cost factor, we are allowed 100,000 shared runner minutes. Assuming that we do not change the way opam-coq-archive works and based on the September stats (26,000 minutes for opam-coq-archive, 782,000 minutes for Coq), we should divide our CI consumption by more than a factor 10 on the Coq repository to stay in the new limits.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:32):

Unfortunately, even if we could achieve this instantly, it would have to be even more (like a factor 30) for the month of October, since we've already spent close to 70% of the available minutes in the first three days.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:37):

:bomb:

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:37):

So this is no-CI October then?

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:38):

I actually think it's a realistic target to achieve if we cut on the long-running jobs most of the time.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:38):

That's why I was saying:

I propose to urgently disable most CI jobs by default (depending on the value of a variable).

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:39):

Urgently, as in now?

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:40):

If we consumed 70% in 3 days, that's about 1% per hour.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:40):

Yes, as in today.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:41):

Who's doing that? I'm personally not CI-aware enough to write a patch, and this is a red alert :rotating_light:

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:41):

I can

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:42):

please!

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:49):

I propose to merge your PR as soon as its CI finishes.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 15:50):

(Unironically, that is.)

view this post on Zulip Maxime Dénès (Oct 03 2022 at 15:52):

Do we still use Inria VMs for our CI?

view this post on Zulip Maxime Dénès (Oct 03 2022 at 15:53):

If not I can look into adding some runners there this week

view this post on Zulip Gaëtan Gilbert (Oct 03 2022 at 15:54):

we don't

view this post on Zulip Maxime Dénès (Oct 03 2022 at 15:54):

the only remaining VM we have is coq-osx1, is it safe to delete?

view this post on Zulip Maxime Dénès (Oct 03 2022 at 15:55):

it eats up a part of our memory quota

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 15:56):

Let's discuss this in a call, but I think we can:

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 15:56):

@Théo Zimmermann , instead of disabling jobs, why don't we tag them so they are only picked by our runners?

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:57):

Maxime Dénès said:

the only remaining VM we have is coq-osx1, is it safe to delete?

It is not. AFAICT, it is still used to store the macOS signing key (Michael downloads it on its computer to sign packages, then deletes it).

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 15:58):

There are some other choices to further optimize this as discussed in the https://github.com/coq/coq/issues/16201

view this post on Zulip Maxime Dénès (Oct 03 2022 at 15:58):

Ok, noted. We still have enough quota to create some additional runners, though.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 15:58):

Emilio Jesús Gallego Arias said:

Théo Zimmermann , instead of disabling jobs, why don't we tag them so they are only picked by our runners?

If we don't disable most CI jobs now, we will have lost all of our remaining CI minutes. But of course, I'm in favor of deploying our own runners to pick up some of the load as soon as we can realistically do that.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:00):

Maxime Dénès said:

If not I can look into adding some runners there this week

It is not worth it to do so with the VM infra, it is better to use directly the runners implemented in gitlab.inria.fr

view this post on Zulip Maxime Dénès (Oct 03 2022 at 16:00):

Emilio Jesús Gallego Arias said:

Let's discuss this in a call, but I think we can:

do you have a pointer to the branch?

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:01):

Théo Zimmermann said:

If we don't disable most CI jobs now, we will have lost all of our remaining CI minutes. But of course, I'm in favor of deploying our own runners to pick up some of the load as soon as we can realistically do that.

Well, either by disabling most of CI or by running out of minutes, we still lose CI so we are in shit.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:01):

Maxime Dénès said:

Emilio Jesús Gallego Arias said:

Let's discuss this in a call, but I think we can:

do you have a pointer to the branch?

Yup one se

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:01):

One sec

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:01):

I think that the fastest is to migrate to gitlab.inria.fr

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:01):

provided Inria allows us for enough quote

view this post on Zulip Guillaume Melquiond (Oct 03 2022 at 16:02):

Emilio Jesús Gallego Arias said:

It is not worth it to do so with the VM infra, it is better to use directly the runners implemented in gitlab.inria.fr

No, that is not possible. The shared runners available in the Inria Gitlab are just a handful. We will bring the whole infrastructure to its knees if we dump the whole Coq CI on them.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:02):

https://gitlab.inria.fr/egallego/coq/-/pipelines/545950

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:02):

That's the lastest run I think

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:03):

Guillaume Melquiond said:

Emilio Jesús Gallego Arias said:

It is not worth it to do so with the VM infra, it is better to use directly the runners implemented in gitlab.inria.fr

No, that is not possible. The shared runners available in the Inria Gitlab are just a handful. We will bring the whole infrastructure to its knees if we dump the whole Coq CI on them.

@Guillaume Melquiond do you have a pointer to the resource assignation? I have a good idea on the computer power we need, it is actually not so much.

view this post on Zulip Guillaume Melquiond (Oct 03 2022 at 16:03):

The current state is of the Inria runners is: "Pour l'instant il y a 8 slots pour le runner small, 4 pour le medium et 11 pour le large".

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:04):

But are these real or virtual slots?

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:04):

I mean per-project

view this post on Zulip Gaëtan Gilbert (Oct 03 2022 at 16:04):

Emilio Jesús Gallego Arias said:

https://gitlab.inria.fr/egallego/coq/-/pipelines/545950

404

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:04):

Permission error...

view this post on Zulip Guillaume Melquiond (Oct 03 2022 at 16:05):

No, that is for the whole instance. There are no more than 11 jobs in parallel that can be run globally.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:05):

I store private branches there so I can't make ejgallego/coq public, let me see

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:06):

I think that the GitLab CI fallout is anyway a reminder that we have been burning lots of energy without paying enough attention and I honestly believe that the only solution is to be more careful about how we use CI.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:07):

https://github.com/ejgallego/coq/tree/inria-ci that's the last version I tried

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:07):

Note that what I proposed is not to lose our CI but to not run the full version by default at every push on any PR.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:07):

Yes

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:08):

@Gaëtan Gilbert you can push that branch to your own fork on inria CI and it should produce a CI

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:08):

I can't open my own repos right now

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:09):

Théo Zimmermann said:

I think that the GitLab CI fallout is anyway a reminder that we have been burning lots of energy without paying enough attention and I honestly believe that the only solution is to be more careful about how we use CI.

Yes, there is a lot we can optimize actually

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:10):

We can't really blame gitlab for restricting this

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:11):

Free CI can't be unlimited, the economics just can't work

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:11):

Note that even with the tweak to the gitlab files, all PRs currently open and not rebased on master will trigger a full build.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:11):

We can't really blame gitlab for restricting this

No, we can't. The only thing I blame them for is not communicating more clearly a bit ahead of time.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:12):

Pierre-Marie Pédrot said:

Note that even with the tweak to the gitlab files, all PRs currently open and not rebased on master will trigger a full build.

Hum, no. coqbot does a merge with master and pushes that commit to the GitLab mirror.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:12):

Ah, that's a good news in an ocean of trouble then.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:13):

I'm surprised to hear that our load (which is not so big IMHO) would bring the Inria CI infra to its knees tho

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:13):

I revised my estimations but a single reasonably modern server should handle our load just fine

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:14):

Maybe it's time to whine to the inventeurs du monde numérique?

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:15):

Do we not have the budget to buy more minutes?

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:17):

Also moving to GitHub actions could be possible since there is an "enterprise plan" allowing 180 concurrent workers.
https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:18):

However we would lose the ability to inspect artifacts

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:18):

Pierre-Marie Pédrot said:

Maybe it's time to whine to the inventeurs du monde numérique?

For sure if not for what Guillaume said, that would be a very good solution, as gitlab.inria.fr would be a drop-in replacement for us

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:18):

so we would just need to push my branch to Coq, and update the bot to link to gitlab.inria.fr

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:18):

but indeed we need to whine^W discuss with the people in charge

view this post on Zulip Karl Palmskog (Oct 03 2022 at 16:19):

I understand this is way down the priority ladder, but this affects the Coq opam archive CI too, right?

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:19):

@Gaëtan Gilbert do you mind pushing the inria-ci branch to your own gitlab.inria.fr fork so we can get an Inria run

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:19):

?

view this post on Zulip Gaëtan Gilbert (Oct 03 2022 at 16:20):

what's the point?
I think everyone believes that it works

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:20):

Not @Guillaume Melquiond it seems. I'd be curious to see if the prophecy is true. 2022/10/03, aka "The day the Inria Gitlab stood still".

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:21):

Karl Palmskog said:

I understand this is way down the priority ladder, but this affects the Coq opam archive CI too, right?

Yes, but the opam-coq-archive CI is not so big. If we were to remove or sufficiently scale down the Coq CI, then opam-coq-archive could continue its business-as-usual.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:21):

Gaëtan Gilbert said:

what's the point?
I think everyone believes that it works

I think everybody believes that it works at a small scale. The question is just whether it will handle the full scale.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:22):

But it is easy enough to test actually.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:22):

We can just create an Inria GitLab mirror for Coq now and double-push every PR to both the current repo and the new one (and set FULL_CI=true over there).

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:22):

Then we can see if we bring Inria's infrastructure down to its knees or not.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:23):

The Inria admins will surely appreciate.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:23):

Gaëtan Gilbert said:

what's the point?
I think everyone believes that it works

No, it doesn't fully work, we hit some limits on artifact size

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:24):

A single push will work modulo the artifact size problem, but of course we should talk to the responsible person before pushing dozens of jobs

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:25):

The question is what is the least amount of work? Is talking with the Inria admins for the CI the quickest path?

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:25):

I don't know

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:26):

I think exploring multiple routes in parallel is probably the safest path.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:26):

also more costly tho

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:28):

@Théo Zimmermann Don't you also want to disable the test-suite+async job? This one never terminates on time and it takes uncompressible hours for no good reason.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:30):

Have we done some math to understand if this disable CI work is gonna work?

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:30):

No time for maths, this is the STOP blinking red button.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:30):

I am afraid that in the end it is just wasted work, if even after the saving it doesn't work out for us and we are left without CI

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:30):

Unfortunately, I haven't precise data enough to do this math, but from some plots by @Jason Gross that I have seen, I think that this should work.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:31):

This is the STOP blinking button, but imagine we are not pressing the right STOP button but the AUTO-DESTRUCT button

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:31):

Emilio Jesús Gallego Arias said:

I am afraid that in the end it is just wasted work, if even after the saving it doesn't work out for us and we are left without CI

This was some work that I wanted to do anyway. I just didn't want to do it now.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:31):

so we rather read the labels on the buttons carefully before pushing

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:31):

Recall that this issue https://github.com/coq/bot/issues/159 was opened in 2021 after a discussion that happened in 2020.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:31):

Any button is good to press when you're bound to fall into the void in a split second.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:32):

Merge first, talk later.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:32):

Théo Zimmermann said:

Emilio Jesús Gallego Arias said:

I am afraid that in the end it is just wasted work, if even after the saving it doesn't work out for us and we are left without CI

This was some work that I wanted to do anyway. I just didn't want to do it now.

Yes, tying to figure out what to build is a very interesting topic, we have been discussing about it in Coq-Universe too.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:33):

Pierre-Marie Pédrot said:

Any button is good to press when you're bound to fall into the void in a split second.

Yes, but are we bound to fall into the void in a split second? I don't see such urgency here.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:33):

Do we all agree that we don't want to drop CI for merges to main, right?

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:34):

If so, let's keep calm and try to assess what our best path is.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:35):

Using FULL_CI + the bot could be a good stragegy, or maybe it doesn't work because we will still run out of minutes

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:35):

gitlab.inria.fr seems way more promising but we need to understand if Inria can provide us the right capacity

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:36):

Deploying our runner + tagging works could be a good solution, but I'm not sure if that's possible / what is the status of servers

view this post on Zulip Emilio Jesús Gallego Arias (Oct 03 2022 at 16:36):

Acting in panic rarely leads to a good outcome

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:38):

Emilio Jesús Gallego Arias said:

Using FULL_CI + the bot could be a good stragegy, or maybe it doesn't work because we will still run out of minutes

I don't know if it will be a good long-term strategy or not but it buys us some time to consider which strategy is best.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:40):

Honestly I don't see what could go wrong with deactivating everything. Thanks to the magic of VCS, worst case we can revert or tweak the result.

view this post on Zulip Pierre-Marie Pédrot (Oct 03 2022 at 16:41):

But really, the thing we don't have right now is time. If 1% goes off every hour we're really screwed.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:44):

It's not really 1% every hour BTW: if someone opens a PR now, it probably will be more than 1% spent.

view this post on Zulip Karl Palmskog (Oct 03 2022 at 16:46):

in that case, maybe consider disallowing PRs via GitHub for a while rather than have people open them by "mistake"

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:47):

BTW, I just computed how much it would cost paying for the minutes we need without doing anything. That would be about €3500/month. Technically, we could pay for one month until we figure this out. That's still a lot of money to throw away I think.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:49):

Karl Palmskog said:

in that case, maybe consider disallowing PRs via GitHub for a while rather than have people open them by "mistake"

No need now: https://github.com/coq/coq/pull/16600 was merged.

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:50):

I found this announcement from last year: https://about.gitlab.com/blog/2021/11/11/public-project-minute-limits/

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:50):

Did something change on top of this?

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:50):

I thought Coq would be exempt from the minute counting.

view this post on Zulip Guillaume Melquiond (Oct 03 2022 at 16:55):

Pierre-Marie Pédrot said:

Not Guillaume Melquiond it seems. I'd be curious to see if the prophecy is true. 2022/10/03, aka "The day the Inria Gitlab stood still".

Let us be clear. There is currently a single 64-core server that is used for the whole Inria CI. It won't be long before we are forbidden to use it, if we are constantly hammering it with jobs from Coq's CI.

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:57):

@Ali Caglayan See https://coq.zulipchat.com/#narrow/stream/237656-Coq-devs-.26-plugin-devs/topic/Too.20many.20pipelines/near/302079565

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:57):

This clearly changed on Oct. 1st

view this post on Zulip Théo Zimmermann (Oct 03 2022 at 16:57):

Although we were not aware of this change having been announced.

view this post on Zulip Ali Caglayan (Oct 03 2022 at 16:58):

Can we contact GitLab since this is somewhat absurd?

view this post on Zulip Ali Caglayan (Oct 03 2022 at 17:02):

Could even be an engineer accidentally omitted the 5.


Last updated: Feb 06 2023 at 18:03 UTC