CI for base+async timing out · Coq devs & plugin devs

Stream: Coq devs & plugin devs

Topic: CI for base+async timing out

Pierre-Marie Pédrot (May 24 2022 at 14:21):

It seems that the CI for build:base+async has recently started to regularly time out. See e.g. https://gitlab.com/coq/coq/-/jobs/2498911484. Anybody knows what could have caused that?

Pierre-Marie Pédrot (May 24 2022 at 14:22):

Judging from the place where it dies (at the end of Reals) it's probably short of a few minutes, if not seconds.

Pierre-Marie Pédrot (May 24 2022 at 14:23):

Should we just bump the time limit?

Enrico Tassi (May 24 2022 at 14:53):

I saw inconclusive measurements as well, like "time run: 1h" but still timeout.

Emilio Jesús Gallego Arias (May 24 2022 at 16:03):

We can just remove the async stuff IMO

Enrico Tassi (May 24 2022 at 19:52):

soon, but not yet

Emilio Jesús Gallego Arias (May 24 2022 at 19:54):

If we are planning to remove it I'm not sure why we can just cleanup now

Emilio Jesús Gallego Arias (May 24 2022 at 19:54):

What is the utility, is much slower than regular compilation

Emilio Jesús Gallego Arias (May 24 2022 at 19:54):

why would be the use case?

Emilio Jesús Gallego Arias (May 24 2022 at 19:54):

moreover the job is not reliable

Emilio Jesús Gallego Arias (May 24 2022 at 19:54):

so we are just producing CO2 I think

Gaëtan Gilbert (May 24 2022 at 19:56):

didn't it catch some issues? in some close_proof refactoring IIRC

Emilio Jesús Gallego Arias (May 24 2022 at 19:57):

@Gaëtan Gilbert likely it did, it's been a while. But IIRC the issues were just problems in the Stm layer. So still not sure the right fix is just not to remove all that code.

Emilio Jesús Gallego Arias (May 24 2022 at 19:58):

We are supporting vio to vo for example, not sure why it makes sense to support a format that is O(n²) SPACE

Emilio Jesús Gallego Arias (May 24 2022 at 19:58):

on the document

Enrico Tassi (May 24 2022 at 19:58):

I don't get why, after years me explaining it, you still think this job is there to test thr speed. It is there to test that coqide's jump to end does not exploses all the times.

Emilio Jesús Gallego Arias (May 24 2022 at 19:58):

I don't think this job is there to test speed

Emilio Jesús Gallego Arias (May 24 2022 at 19:59):

I do think what is testing is not very useful

Emilio Jesús Gallego Arias (May 24 2022 at 19:59):

and moreover taking a huge amount of time

Emilio Jesús Gallego Arias (May 24 2022 at 19:59):

We could surely test that coqide behavior a bit differently?

Enrico Tassi (May 24 2022 at 19:59):

Good night

Emilio Jesús Gallego Arias (May 24 2022 at 20:00):

Then there is the issue that often, coqide "jump to end" is slower than a sequential compilation, so why we need that?

Emilio Jesús Gallego Arias (May 24 2022 at 20:00):

Good night

Paolo Giarrusso (May 24 2022 at 21:44):

@Emilio Jesús Gallego Arias latency and throughput aren’t proportional. “Jump to the end” and “jump to point” are both lower-latency ways to (a) get to a given point (b) get a first-stage checking (like vos).

Paolo Giarrusso (May 24 2022 at 21:45):

I don’t use coqide but benefit from this test via async proofs in vscoq

Emilio Jesús Gallego Arias (May 25 2022 at 09:12):

@Paolo Giarrusso we could maybe just then test your use case, which should be much faster?

Emilio Jesús Gallego Arias (May 25 2022 at 09:13):

I mean, don't get me wrong, but the async tests are taking huge CI time, so that testing methodology seems fishy to me, when async-based jobs are like 20% of the CI testing?

Enrico Tassi (May 25 2022 at 09:23):

Really? Can you show us your calculation?

Emilio Jesús Gallego Arias (May 25 2022 at 09:38):

Core CI: build + test suite + validate:

base: 52 mins, async: 179 mins (plus it is allow_fail)

Total CI time without the heavy ones: 665
Total CI time with (unimath, vst, and fiat-crypto): 1155

So 15%

Things have gone really bad with unimath btw, that seems like a waste and we should optimize that too. The numbers don't include worker setup which is significant so I guess the first number is more like 600. Anyways would be cool to have better tools to understand that.

Emilio Jesús Gallego Arias (May 25 2022 at 09:39):

That total number is missing doc and 32bit build

Paolo Giarrusso (May 25 2022 at 09:40):

@Emilio Jesús Gallego Arias would testing “jump to end and wait for all proofs” be faster? learning at some point that the proofs failed at some point is still useful. Otherwise I'd just use proof general's option to throw away proofs.

Mostl am just voting against throwing away the code until there is a better replacement. How to test that code I don’t know, but I imagine it’s not enough to test it on few files unless they are designed to maximize test coverage?

Paolo Giarrusso (May 25 2022 at 09:41):

IIRC I have seen plugin-specific issues with async builds in past releases, and Next Obligation. seems to have special treatment (or rather, to disable async)

Emilio Jesús Gallego Arias (May 25 2022 at 09:43):

@Paolo Giarrusso indeed the problem with the code we have today is that it is making the development of the replacement very difficult.

Emilio Jesús Gallego Arias (May 25 2022 at 09:44):

Supporting the current code takes a huge amount of available cycles, and moreover, blocks some important refactoring, API redesigns and the old code can't just work with a more dynamic model (which is IMHO what you want)

Enrico Tassi (May 25 2022 at 09:46):

I think you should stop pushing it. It won't go away before we have a replacement.

Emilio Jesús Gallego Arias (May 25 2022 at 09:49):

I can see why you think that way, but as far as I can tell what we decide to do is not only your call

Emilio Jesús Gallego Arias (May 25 2022 at 09:49):

I don't see why having such discussion is a problem

Emilio Jesús Gallego Arias (May 25 2022 at 09:49):

whatever we decide

Emilio Jesús Gallego Arias (May 25 2022 at 09:50):

I have pretty strong technical and maintainership arguments in favor of the removal; so it is not like I'm coming to this point arbitrarily.

Emilio Jesús Gallego Arias (May 25 2022 at 09:51):

Moreover, such kind of process are common, for example, Isabelle did something similar and turned out pretty well for them

Emilio Jesús Gallego Arias (May 25 2022 at 09:51):

An indisputible fact is that Coq UI is in the same state than 10 year ago

Emilio Jesús Gallego Arias (May 25 2022 at 09:51):

and considerable resources have been spent without a result

Emilio Jesús Gallego Arias (May 25 2022 at 09:52):

I think we can agree on that (modulo jsCoq and unusable experiments such as coq-lsp)

Emilio Jesús Gallego Arias (May 25 2022 at 09:53):

Moreover, the current code seems like not maintained, and creates often problem for people working on other areas (which leads to strongly-worded complains every often in a while)

Anyways, when we discuss this I will present my arguments, then we can see.

Paolo Giarrusso (May 25 2022 at 09:56):

that suggests that a separate branch for removing this code and developing the replacement is an option, even tho that has other downsides

Paolo Giarrusso (May 25 2022 at 09:58):

but if the branch is master and the replacement is not ready in 6 months, some nontrivial annoyance will result.

Enrico Tassi (May 25 2022 at 09:59):

I made that code optional, you can link a coq toplevel without the STM. This is whtat we are doing in the vscoqtop wip.
Here we are talking about removing the tests for the STM, which we should keep until some alternative is ready.
People are using CoqIDE and VSCoq, they use that code and we should keep testing it, even if it sucks.

Last updated: Oct 13 2024 at 01:02 UTC