Nightly CI breakages · Coq Platform devs & users

Stream: Coq Platform devs & users

Topic: Nightly CI breakages

Karl Palmskog (Nov 09 2022 at 09:38):

@Michael Soegtrop due to intermittent issues with GitHub, opam, opam packages, etc., I think we can expect a nightly CI for the Platform to break quite often (I'd guess at least a few times per month, possibly more). Generally, these breakages are just minor issues that we fix as we go along (maybe in a few days, but sometimes it can take longer).

Do you agree with this view? That is, that Platform breakages are going to occur, and quite few will be considered "high severity" unless there is an impending release.

If we really want some iron-clad guarantees about opam packages in releases, then I think this should be based on some tiering, e.g., we divide packages into "critical" and "less critical", etc. At least we wouldn't do all hands on deck to fix a rarely used utility library.

Karl Palmskog (Nov 09 2022 at 09:40):

to give a relevant example: if all MathComp packages in some released Platform version depend on HB, and HB breaks due to some dependency in regular opam, I'd consider that an "all hands on deck" event.

Karl Palmskog (Nov 09 2022 at 09:43):

but for some not-super-popular standalone utility libraries, even if they are in full level, if manual installation fails in the scripts for a while due to URL problem, my view is at least that this is fine, we'll get around to it within a week or so

Michael Soegtrop (Nov 09 2022 at 10:31):

@Karl Palmskog : I see as critical issues which have the effect that release picks of Coq Platform "full" level cannot be built from sources any more - I am a bit more relaxed about the "extended" level. In case I find fixes too slow I still have the option to rewind upstream opam changes temporarily in the local opam repo.

From experience I would say that breakages due to OS issues are frequency wise about en par with issues caused by opam. And all together it is not that frequent - on average I would say one breakage every two months - just recently it was a bit above average (the cygwin curl issue, an issue with the Mac installers caused by a GitHub runner update to macOS 11.7.1 and the HoTT issue). But then I had nothing for quite a few months before.

IMHO the daily CI of Coq Platform is sufficient - usually things are fixed within a few days.

Something which currently causes me a bit more headaches is the breakage of the MacOS installer generation, but this does not affect building Coq Platform from sources and there is no "pass before merge" CI we could have had to prevent this.

Michael Soegtrop (Nov 09 2022 at 10:33):

What is more of an issue is that I likely also have to reduce the build (not build all of full) in Ubuntu CI, because it also regularly timeouts (it is usually 4..5h).

Michael Soegtrop (Nov 09 2022 at 10:34):

I guess I have to split it up into "most of full + extended" and "really long stuff in full".

Karl Palmskog (Nov 09 2022 at 10:34):

then, isn't it a good idea to have some tiering even inside "full"?

Karl Palmskog (Nov 09 2022 at 10:37):

an issue I foresee can happen more often going forward is unexpected consequences of changing bounds in complex opam dependency chains, like with infotheo and analysis. This can go across multiple packages and take time to track down.

Michael Soegtrop (Nov 09 2022 at 10:40):

Eventually I want to have much more fine grain CI jobs, but I wanted to connect this to downloadable Coq Platform packages, which mostly hangs on the OS dependency of the .vo format.

Karl Palmskog (Nov 09 2022 at 10:40):

if nothing else, I think we saw some evidence of what packages are really used most often in the survey:

https://thzimmer.gitlabpages.inria.fr/coq-survey-2022-assets/gen-libraries-barplot-stacked.png
https://thzimmer.gitlabpages.inria.fr/coq-survey-2022-assets/domain-libraries-barplot-stacked.png
https://thzimmer.gitlabpages.inria.fr/coq-survey-2022-assets/mathcomp-libraries-barplot-stacked.png

Michael Soegtrop (Nov 09 2022 at 10:42):

Unimath is an issue here because it takes really long to build - I excluded it from Mac and WIndows CI and only kept it in Ubuntu CI.

Michael Soegtrop (Nov 09 2022 at 10:45):

But all in all this is well covered, and as I said the CI serves its purpose as is. I usually find and fix issues before users run into them. The only issue I see is the run time issue.

Théo Zimmermann (Nov 09 2022 at 13:58):

Regarding timeouts, I think Maxime has made progress on getting a MacOS Mini and should also be able to set up custom runners for Ubuntu if needed. To check with him (but he is busy with other things today).

Last updated: Apr 19 2024 at 02:02 UTC