@Michael Soegtrop due to intermittent issues with GitHub, opam, opam packages, etc., I think we can expect a nightly CI for the Platform to break quite often (I'd guess at least a few times per month, possibly more). Generally, these breakages are just minor issues that we fix as we go along (maybe in a few days, but sometimes it can take longer).
Do you agree with this view? That is, that Platform breakages are going to occur, and quite few will be considered "high severity" unless there is an impending release.
If we really want some iron-clad guarantees about opam packages in releases, then I think this should be based on some tiering, e.g., we divide packages into "critical" and "less critical", etc. At least we wouldn't do all hands on deck to fix a rarely used utility library.
to give a relevant example: if all MathComp packages in some released Platform version depend on HB, and HB breaks due to some dependency in regular opam, I'd consider that an "all hands on deck" event.
but for some not-super-popular standalone utility libraries, even if they are in full level, if manual installation fails in the scripts for a while due to URL problem, my view is at least that this is fine, we'll get around to it within a week or so
@Karl Palmskog : I see as critical issues which have the effect that release picks of Coq Platform "full" level cannot be built from sources any more - I am a bit more relaxed about the "extended" level. In case I find fixes too slow I still have the option to rewind upstream opam changes temporarily in the local opam repo.
From experience I would say that breakages due to OS issues are frequency wise about en par with issues caused by opam. And all together it is not that frequent - on average I would say one breakage every two months - just recently it was a bit above average (the cygwin curl issue, an issue with the Mac installers caused by a GitHub runner update to macOS 11.7.1 and the HoTT issue). But then I had nothing for quite a few months before.
IMHO the daily CI of Coq Platform is sufficient - usually things are fixed within a few days.
Something which currently causes me a bit more headaches is the breakage of the MacOS installer generation, but this does not affect building Coq Platform from sources and there is no "pass before merge" CI we could have had to prevent this.
What is more of an issue is that I likely also have to reduce the build (not build all of full) in Ubuntu CI, because it also regularly timeouts (it is usually 4..5h).
I guess I have to split it up into "most of full + extended" and "really long stuff in full".
then, isn't it a good idea to have some tiering even inside "full"?
an issue I foresee can happen more often going forward is unexpected consequences of changing bounds in complex opam dependency chains, like with infotheo and analysis. This can go across multiple packages and take time to track down.
Eventually I want to have much more fine grain CI jobs, but I wanted to connect this to downloadable Coq Platform packages, which mostly hangs on the OS dependency of the .vo format.
if nothing else, I think we saw some evidence of what packages are really used most often in the survey:
Unimath is an issue here because it takes really long to build - I excluded it from Mac and WIndows CI and only kept it in Ubuntu CI.
But all in all this is well covered, and as I said the CI serves its purpose as is. I usually find and fix issues before users run into them. The only issue I see is the run time issue.
Regarding timeouts, I think Maxime has made progress on getting a MacOS Mini and should also be able to set up custom runners for Ubuntu if needed. To check with him (but he is busy with other things today).
Last updated: Sep 25 2023 at 12:01 UTC