Stream: Coq devs & plugin devs

Topic: CI


view this post on Zulip Jason Gross (May 11 2020 at 05:25):

The test-suite is failing frequently (about 10%) of the time on Mac OS. The failure I'm seeing in at least one of these cases is

File "./modules/PO.v", line 51, characters 0-12:
Error:
/Users/runner/runners/2.166.4/work/1/s/test-suite/modules/Nat.vo: premature end of file. Try to rebuild it.

0m0.001s 0m0.001s
0m0.084s 0m0.050s
==========> FAILURE <==========
    modules/PO.v...Error! (should be accepted)

FAILURES
    modules/PO.v...Error! (should be accepted)

What's up with this?

view this post on Zulip Jason Gross (May 11 2020 at 05:25):

https://dev.azure.com/coq/coq/_build/results?buildId=9655&view=logs&j=a5e52b91-c83f-5429-4a68-c246fc63a4f7&t=bde16fc7-7659-5b1c-12aa-0110866d993c

view this post on Zulip Jason Gross (May 11 2020 at 05:25):

https://github.com/coq/coq/pull/12129/checks?check_run_id=659402354

view this post on Zulip Hugo Herbelin (May 11 2020 at 14:29):

I have this from time to time too (as well as for another modules test) but did not investigated. If you tell it is MacOS X specific then I could pay more attention to it when it arrives.

view this post on Zulip Hugo Herbelin (May 11 2020 at 14:52):

Any hints on what to think about this refman CI failure:

Latexmk: calling xindy( -L english -C utf8 -M sphinx.xdy -I xelatex -o CoqRefMan.ind CoqRefMan.idx )
Latexmk: applying rule 'makeindex CoqRefMan.idx'...
Warning: reserving address range 0x7ff2855cd000...0x8020a1572fff that contains memory mappings. clisp might crash later!

view this post on Zulip Théo Zimmermann (May 11 2020 at 14:57):

Is the runner pyrolyse? See coq/coq#12280.

view this post on Zulip Jason Gross (May 12 2020 at 00:30):

@Hugo Herbelin I'm not sure if it's Mac OS specific, but this is the first I've seen it, and it was on Mac OS.

view this post on Zulip Antonio Nikishaev (May 13 2020 at 20:20):

I noticed (while working on coq/coq#11992) that there's a crash in fiat_parsers CI: "File "interp/constrextern.ml", line 844, characters 9-15: Assertion failed.", even when CI is passing (don't ask me, no idea :) — e.g. grep for Anomaly in https://travis-ci.org/github/mit-plv/fiat/jobs/683994964 (web ui) / https://api.travis-ci.org/v3/job/683994964/log.txt (raw text). Is that a known Coq bug? /cc @Jason Gross

view this post on Zulip Jason Gross (May 13 2020 at 20:36):

Very strange. This is because I generally assume that if make fails, then make || make will also fail, but here it does not. Let me fix that on the CI.

view this post on Zulip Jason Gross (May 13 2020 at 20:36):

In any case, this seems to be a new bug in Coq, where somehow the anomaly occurs the first time you coq the file, but not the second time

view this post on Zulip Jason Gross (May 13 2020 at 20:46):

Hm, this is actually pretty tricky to fix on my end, due to how my automation is set up. I'm also not sure that it's the job of my own CI to catch heisenanomalies in Coq... (Note that the build succeeds on Coq's CI)

view this post on Zulip Pierre-Marie Pédrot (May 14 2020 at 12:32):

Is there a runner shortage right now? I have a PR that doesn't start any CI run after quite a while, except for windows tests .

view this post on Zulip Théo Zimmermann (May 14 2020 at 12:48):

@Pierre-Marie Pédrot You mean this is not the usual GitLab CI pipeline exceeded error?

view this post on Zulip Théo Zimmermann (May 14 2020 at 12:50):

Oh indeed, I have the same problem on my PR.

view this post on Zulip Théo Zimmermann (May 14 2020 at 12:52):

We have only 22 concurrently running jobs right now, two of them being Windows builds and most of the rest being on pyrolyse or roquableu...

view this post on Zulip Théo Zimmermann (May 14 2020 at 13:00):

When there is a GitLab shared runner shortage, we only get 1 concurrent shared runner for our project.

view this post on Zulip Pierre-Marie Pédrot (May 14 2020 at 13:02):

The wrathful gods of software engineering have decided against the 8.12 branch

view this post on Zulip Théo Zimmermann (May 14 2020 at 13:02):

Usually this kind of shortage is solved at night though.

view this post on Zulip Théo Zimmermann (May 14 2020 at 13:03):

It's never been like in the Travis times when we could have a queue of 72 hours.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 16:57):

Yeah we have quite a shortage; we should definitively budget some newer server for our CI; a dual threadripper 3950x with 1Tb of RAM could do wonders for our buildds

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 16:58):

Also fiat-crypto is quite out of hand, taking almost 3 hours to complete.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 16:58):

and all the async / quick jobs in general being very slow and still unreliable

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:01):

Before budgeting new servers, I think we should work at reducing the size of our CI. In particular, it is not too late to come back to coq/coq#6724. If we implement this, then we can adopt a bors-like merging workflow, and if we do this, then we can set a lighter, partly on-demand CI for PRs.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:03):

Not sure that will work so well for us; as compared to a simpler solution with what we have [for example coqbot running the full CI only when the PR is ready to merge due to tags , etc...]

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:03):

#6724 is not actually blocking having a bors-like setup, you can have it with the current setup

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:03):

but in both cases, if the merge fails, you are quite in trouble.

view this post on Zulip Maxime Dénès (May 14 2020 at 17:04):

We often want to trigger a CI before PRs are finalized, so better just do something on-demand than rely on labels.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:04):

a repush won't run the tests you care about, your local test may miss something, and another merge attempt is equally expensive

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:05):

and if you go with a lockfile, you are gonna have to worry about consistency as extra, this is not a trivial overhead

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:05):

but indeed we could be lighter right now, I wonder by how much tho

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:06):

as the amount of stuff you care about really varies among PRs, we could use the newer path-based selection rules in CI

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:06):

as proposed for the docs

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:06):

I don't see how you can have a bors-like workflow with backward-incompatible overlays in the current setup. Please explain.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:07):

same way that you would do when implementing the lockfile

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:07):

anyways you still need a super beefy server

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:07):

even more beefy than now

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:07):

Why?

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:07):

at least today, we don't suffer the bottleneck when merging

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:08):

imagine days like today, or in CUDW

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:08):

with dozens of merge queued

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:08):

Sure, but bors is supposed to optimize the merging of many PRs.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:08):

also quite a few failing

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:08):

yeah but in our case it'd need tremendous CI power

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:08):

I don't think so.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:09):

well my numbers are otherwise

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:09):

how much CPU time do we need to merge 20 PRs?

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:09):

Where do you get your numbers from?

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:09):

From current CI

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:09):

that's easy to calculate

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:09):

how much CPU time do we need to merge 20 PRs?

Same as for 1 PR.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:09):

Same a for 1 PR?

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:09):

how come?

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:10):

Yes, because if you have several PRs that are merged in a close window, you run just one test.

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:10):

That's how bors works.

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:10):

That's also how RMs manage the backporting queue generally.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:10):

what happens if the first PR on the chain makes the merge fail?

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:10):

Well, in this case, you have to determine which PR is the culprit and restart the merge of the others.

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:11):

So there's a need for some not-so-simple logic.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:12):

bors does bisecting indeed, but that may needs tons of power in busy days

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:12):

or have devs waiting for long IMO

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:13):

Still doing numbers for a busy day with 20 PRs a day + 10% prob of failing + 10% prob of consistency issues and the default bisect strategy

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:13):

you need a worker able to swallow a lot

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:13):

Bisecting still optimizes the use of the power, but smart human beings can optimize further by looking into the error and guessing where it comes from.

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:14):

10% prob of consistency issues

That's the reason why I'm saying a bors-like workflow is not possible in the current setup for backward-incompatible overlays.

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:14):

10% prob of failing

And this can be made lower by relying on humans guessing which PRs need additional CI testing before attempting a merge.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:14):

consistency issues do arise in the lockfile setup

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:14):

in fact they are more noticeable

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:15):

the more the delta for a distributed setup, the weaker the consistency

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:15):

Do you mean when lockfiles are updated?

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:15):

I'm not a big fan of hybrids CI setups

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:16):

WDYM by this? Having a different set of tests on branches and PRs?

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:16):

Because that's the only way to reduce the size of the CI, I'm afraid.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:16):

Théo Zimmermann said:

Do you mean when lockfiles are updated?

yes, you need to update them often, otherwise you are gonna go stale soon, and the more you with an unnoticed consistency break , the longer it usually takes to repair [that's an hypothesis tho]

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:17):

I share this intuition.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:17):

I prefer to add more power to the CI, than to reduce the size, but indeed some optimizations could be done

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:17):

I have some environmental concerns...

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:18):

the current system is painful, but at least it has pretty good consistency properties; the lockfiles + bors is fine tho; coqbot could take care of bumping etc...

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:18):

I share this intuition.

But my feeling is that bumping once a week is a good compromise

coqbot could take care of bumping etc...

Right, that was my plan

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:19):

environmental concerns are indeed a problem, also hard to measure accurately ; human time also has impact

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:19):

Actually the kind of server I proposed is much much more efficient in terms of energy than what we are using now

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:20):

So maybe we should address both.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:20):

it has by far the best power / Watt ratio in the whole server marketplace

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:21):

I didn't know that this could be optimized. Well, then maybe we should go ahead and buy it. I have some budget available that I could spend on this. Maybe not during the sanitary crisis though.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:21):

it is true that we should attempt for the server not to be too idle

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:21):

but even now idle servers thanks to ssd + newer cpus don't spend a lot of energy

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:22):

Théo Zimmermann said:

I didn't know that this could be optimized. Well, then maybe we should go ahead and buy it. I have some budget available that I could spend on this. Maybe not during the sanitary crisis though.

Oh sure, the increase in energy efficiency due to newer node process + better power design is dramatic; I wouldn't be surprised if for example a top of the line ryzen outperforms like 10x pyrolise in terms of energy per computing unit.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:24):

what to remove from the CI is not so clear tho ; plugins you mostly need, contribs the same; you are left with jobs regarding multiple ocaml versions and test suites

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:24):

and validate

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:24):

that's some saving, but not enough

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:25):

on the other hand indeed we could just not run the contribs

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:25):

just the cheap set, and delay that to merge

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:25):

indeed these failures are rare

view this post on Zulip Théo Zimmermann (May 14 2020 at 17:25):

Plugins yes, you need to test always, but they build fast. Libraries (what you call contribs I guess), you could test less often: only when there seems to be a risk of incompatibility. Then, the idea is that every time you get a failure that you didn't catch, that's a test-suite bug.

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:28):

top of the line epyc sorry

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:28):

unfortunately the kind of server I'm talking about is very efficient, but the price tag is well above 20K O_O

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:28):

well there are cheaper options

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:28):

but I was talking about 128 cores

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:29):

256 threads

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:29):

quite a beefy machine

view this post on Zulip Emilio Jesús Gallego Arias (May 14 2020 at 17:32):

huhu gitlab just crashed

view this post on Zulip Jason Gross (May 24 2020 at 17:15):

ERROR: Preparation failed: Error: No such image: registry.gitlab.com/coq/coq:bionic_coq-V2020-05-19-V33 (docker.go:203:0s)

What's up with this?

view this post on Zulip Gaëtan Gilbert (May 24 2020 at 17:26):

where are you seeing this?

view this post on Zulip Jason Gross (May 24 2020 at 21:16):

Gaëtan Gilbert said:

where are you seeing this?

I was seeing this on the CI for https://github.com/coq/coq/pull/11029 before I rebased it. Now that I've rebased, I'm seeing https://github.com/coq/coq/issues/12400 on fiat-crypto and also Fatal error: ref_table overflowon engine-bench, which I'm about to report

view this post on Zulip Jason Gross (May 24 2020 at 21:28):

Reported as https://github.com/coq/coq/issues/12401. But what does "ref_table overflow" even mean?

view this post on Zulip Jason Gross (May 24 2020 at 21:41):

Gaëtan Gilbert said:

where are you seeing this?

I just hit this again, and reported it as https://github.com/coq/coq/issues/12402 and https://gitlab.com/coq/coq/-/jobs/565521645

view this post on Zulip Enrico Tassi (May 25 2020 at 07:14):

Jason Gross said:

Reported as https://github.com/coq/coq/issues/12401. But what does "ref_table overflow" even mean?

https://github.com/ocaml/ocaml/blob/b2f467abc0d2caaf40b6446a567f968ce935247f/runtime/minor_gc.c#L621

view this post on Zulip Théo Zimmermann (Jul 09 2020 at 10:25):

Just saw this failed job where git fetch --unshallow failed, likely because some branch was removed between the time the repo was cloned and the time when the command was run. Is there any way of making git fetch --unshallow more robust, e.g. by unshallowing only the current branch?

view this post on Zulip Gaëtan Gilbert (Jul 09 2020 at 10:28):

https://github.com/coq/coq/pull/9594#discussion_r261697603

Why not override GIT_DEPTH?

That's what I wanted to do at first but nowhere in the documentation it says what value to put if you want no limit (I suppose empty would work but this is unspecified).

let's just use empty GIT_DEPTH

view this post on Zulip Théo Zimmermann (Jul 09 2020 at 12:43):

It still isn't documented what it does when you put it to an empty value or zero (whether it is blindly substituted or not) so we should test what it does first.

view this post on Zulip Théo Zimmermann (Sep 16 2020 at 13:45):

@Jason Gross It seems that coq-tools are consistently failing and the first failure in master was when a PR which had a fully green CI, so I suspect the problem is coming from a change on your side.

view this post on Zulip Ali Caglayan (Sep 16 2020 at 14:40):

https://github.com/JasonGross/coq-tools/commits/master

view this post on Zulip Jason Gross (Sep 16 2020 at 19:05):

Oops, sorry, I forgot to test throughly before pushing, should be fixed now

view this post on Zulip Théo Zimmermann (Sep 21 2020 at 13:14):

Something has changed (not on the Coq side) which has broken Windows CI both on Azure (opam-based) and on GitLab CI (manual build of OCaml). See https://dev.azure.com/coq/coq/_build/results?buildId=11965&view=logs&jobId=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&j=2d2b3007-3c5c-5840-9bb0-2b1ea49925f3&t=f33089cd-d3fe-578c-2c5e-4ee560e10a0d and https://coq.gitlab.io/-/coq/-/jobs/748531263/artifacts/artifacts/buildlogs/ocaml-4.08.1-make-world.opt_err.txt.

view this post on Zulip Théo Zimmermann (Sep 21 2020 at 13:15):

(I've verified that the first failure on master happened when merging a PR whose CI was successful.)

view this post on Zulip Michael Soegtrop (Sep 21 2020 at 14:01):

@Théo Zimmermann : I think cygwin recently updated the C compiler to a new major version - this could easily have all sorts of interesting effects. I think I can pin it to the older version, but this will not work forever since cygwin typically keeps only the most recent two or 3 versions, although for gcc there might be more.

view this post on Zulip Michael Soegtrop (Sep 21 2020 at 14:03):

@Théo Zimmermann : actually it was the MinGW headers. I meant this announcement: (https://cygwin.com/pipermail/cygwin/2020-September/246341.html)

view this post on Zulip Théo Zimmermann (Sep 21 2020 at 14:10):

Interesting thanks! That sounds like a very plausible explanation. Not sure how I should understand your second message though (headers sound like a much less major change).

view this post on Zulip Michael Soegtrop (Sep 21 2020 at 14:31):

I just remembered that I read something about MinGW changes around the compiler but didn't remember the details. The mingw-gcc doesn't seem to have recent changes (https://cygwin.com/packages/summary/mingw64-x86_64-gcc-core.html). Still this static stuff failing might depend on some C library defined macros, so one can't exclude that it was this. The headers did change recently to a new major version, see (https://cygwin.com/packages/summary/mingw64-x86_64-headers.html). If this is time wise coincident, I would say this is a likely root cause.

view this post on Zulip Théo Zimmermann (Sep 22 2020 at 09:35):

Continued here: https://github.com/coq/coq/issues/13065

view this post on Zulip Pierre-Marie Pédrot (Sep 24 2020 at 11:46):

Is it normal for color to be broken on master?

view this post on Zulip Gaëtan Gilbert (Sep 24 2020 at 11:50):

:(

view this post on Zulip Pierre-Marie Pédrot (Sep 24 2020 at 11:53):

dpdgraph seems broken also, I hope it's not due to the PR I've submitted yesterday

view this post on Zulip Pierre-Marie Pédrot (Sep 24 2020 at 11:54):

(it was working properly locally, and I don't understand the logs enough to be certain of the root of the failure)

view this post on Zulip Pierre-Marie Pédrot (Sep 24 2020 at 13:10):

I confirm I broke dpdgraph alone :boom:

view this post on Zulip Pierre-Marie Pédrot (Sep 24 2020 at 13:43):

... but I swear I am not responsible for color TTBMK

view this post on Zulip Pierre-Marie Pédrot (Oct 07 2020 at 17:28):

for some reason now https://gitlab.mpi-sws.org/iris/stdpp/-/archive/.tar.gz is a 404 error, breaking the iris CI

view this post on Zulip Pierre-Marie Pédrot (Oct 07 2020 at 17:32):

ah, this is because the git ref is not correctly set

view this post on Zulip Erik Martin-Dorel (Oct 07 2020 at 20:48):

Théo Zimmermann said:

Yes, because if you have several PRs that are merged in a close window, you run just one test.

Hi, I've just seen this thread (just skimmed, sorry) and I wanted to ask if you were aware of the following GitLab CI feature?
https://docs.gitlab.com/ee/ci/merge_request_pipelines/pipelines_for_merged_results/merge_trains/
− anyway I guess this is probably not useful, as it specifically deals with GitLab MRs, not GitHub PRs… and IIUC, bors provides a similar feature? (I'm not a bors user), although maybe not using the very same algo…

view this post on Zulip Théo Zimmermann (Oct 07 2020 at 21:32):

Looks like a very old quote! Yes, there are many implementations of "merge-trains" "bors-like" features, but what is blocking us from using this system is that our CI is too unreliable, especially with the way we deal with backward-incompatible overlays.

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 09:23):

So Iris seems reliably broken on the recent CI pipelines

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 09:24):

see e.g. https://gitlab.com/coq/coq/-/jobs/779420849 or https://gitlab.com/coq/coq/-/jobs/778337747

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 09:25):

Stdpp seems to have changed its build system according to one commit message, which might explain that

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 09:30):

Indeed, so now we have to look for coq-iris.opam instead of opam.

view this post on Zulip Janno (Oct 08 2020 at 09:50):

I think @Ralf Jung changed something very recently.

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 09:51):

Yep, the opam file got moved, so I wrote a fix

view this post on Zulip Ralf Jung (Oct 08 2020 at 09:53):

Oh so sorry, I meant to update the Coq side of it but then forgot :(

view this post on Zulip Ralf Jung (Oct 08 2020 at 09:53):

I can submit a PR if that helps

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 09:54):

Nah, thanks, it should be solved by https://github.com/coq/coq/pull/13158 hopefully

view this post on Zulip Ralf Jung (Oct 08 2020 at 09:57):

ah but iris-examples will also change to match the new style, so I prepared https://github.com/coq/coq/pull/13159

view this post on Zulip Ralf Jung (Oct 08 2020 at 09:58):

feel free to just take that patch into your PR if that is easier

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 10:01):

Right, so we have to be synchronized with the upstream merge

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 10:01):

When are you planning to perform it?

view this post on Zulip Ralf Jung (Oct 08 2020 at 10:02):

I already did

view this post on Zulip Ralf Jung (Oct 08 2020 at 10:03):

so my PR should pass CI now

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 10:03):

Indeed, I was just reading https://gitlab.com/coq/coq/-/jobs/779609904 :face_palm:

view this post on Zulip Ralf Jung (Oct 08 2020 at 10:03):

it got cancelled due to some limit or so it seems

view this post on Zulip Pierre-Marie Pédrot (Oct 08 2020 at 10:03):

Right, I'll close my PR then.

view this post on Zulip Pierre-Marie Pédrot (Oct 09 2020 at 21:56):

Uh, CI is not starting on https://github.com/coq/coq/pull/13168. Is that normal?

view this post on Zulip Pierre-Marie Pédrot (Oct 09 2020 at 21:57):

... despite two force pushes.

view this post on Zulip Pierre-Marie Pédrot (Oct 09 2020 at 21:57):

Coqbot is dead, long live to coqbot?

view this post on Zulip Pierre-Marie Pédrot (Oct 09 2020 at 21:59):

Answering to myself: github is having problems with webhooks it seems according to https://www.githubstatus.com/

view this post on Zulip Karl Palmskog (Oct 09 2020 at 21:59):

Travis is dead as well, this is a systemic issue

view this post on Zulip Karl Palmskog (Oct 09 2020 at 22:00):

my latest commit to a coq-community project is not even reflected via pull request web interface

view this post on Zulip Théo Zimmermann (Dec 30 2020 at 06:54):

@Guillaume Melquiond Something surprising happened. CompCert started failing suddenly in Coq's CI in both the CompCert job and the VST job (which includes a bundled version of CompCert). This seems unrelated to any change in Coq since it failed suddenly in all builds with no related change. However, since CompCert and its VST copy are not simultaneously updated, I would have expected to see a change in the master branch of Flocq that would explain the failure, but the last commit shown at https://gitlab.inria.fr/flocq/flocq/-/tree/master is from 2 month ago.
See https://gitlab.com/coq/coq/-/jobs/936644456 for the failing log.

view this post on Zulip Guillaume Melquiond (Dec 30 2020 at 07:02):

That is expected. CompCert is not compatible with Flocq's master. (Actually, not much will be compatible with Flocq's master, since it paves the way to Flocq 4.) CompCert's developers have been aware of it since early September: https://github.com/AbsInt/CompCert/pull/368. But there is nothing they can do about it for now. Anyway, this does not matter much for CompCert, since it vendors Flocq.

view this post on Zulip Théo Zimmermann (Dec 30 2020 at 08:56):

Thanks for your quick answer @Guillaume Melquiond, unfortunately I didn't notice until now.

view this post on Zulip Théo Zimmermann (Dec 30 2020 at 08:56):

But why did it start failing just now despite the last commit on Flocq's master being two months old?

view this post on Zulip Théo Zimmermann (Dec 30 2020 at 08:58):

And in Coq's CI we were not using the vendored version to avoid building Flocq three times. Do you have a release branch for Flocq 3 that you maintain and that we could use in Coq's CI instead of master?

view this post on Zulip Enrico Tassi (Dec 30 2020 at 09:03):

I suggest you crate a coq-master branch we track in ci and in which you merge master manually, from time to time, when it fits (eg concert catches up). I do this way for elpi (I develop against Coq stable for example).

view this post on Zulip Guillaume Melquiond (Dec 30 2020 at 09:05):

That is a conjunction of two things. First, Coq's CI stopped using CompCert's vendored Flocq only recently (in preparation of the 8.13 release). Second, I had temporarily stashed the main commits out of Flocq's master to prepare Flocq 3.4 (in preparation of the 8.13 release). Thus, the issue with Coq's CI went unnoticed because both things happened around the same time.

view this post on Zulip Guillaume Melquiond (Dec 30 2020 at 09:19):

Just to be clear, it might take several years before CompCert catches up to Flocq's master. (First, Flocq 4 needs to be released. Second, CompCert needs to decide to drop Flocq 3 in favor of Flocq 4.) So, a coq-master branch does not seem like a good idea (or I misunderstood it). But you can decide to track the flocq-3 branch instead of master. That said, this is only delaying the issue. Because at some point, CompCert's master and CoqInterval's master will disagree (even if only temporarily) on the major version of Flocq they are compatible with.

view this post on Zulip Michael Soegtrop (Dec 30 2020 at 09:49):

CompCert's master and CoqInterval's master will disagree (even if only temporarily) on the major version of Flocq they are compatible with.

This is a serious issue for people working on verification of floating point C code. In the Coq Platform I think we have to pick the latest combination of versions which share the same version of Flocq until this is solved. I thought about if it makes sense to have a separate package for flocq-4 during the transition period, but after thinking it through I came to the conclusion that this is better solved with a separate opam switch for Flocq 3 and Flocq 4 based tools.

I will switch Coq Platform master to track flocq-3 and see what happens.

view this post on Zulip Enrico Tassi (Dec 30 2020 at 09:50):

it might take several years

We can maybe add 2 jobs for flocq, flocq-3 used by compcert, and flocq-4 (to be used at some point), with the plan to remove flocq-3 eventually.

view this post on Zulip Michael Soegtrop (Dec 30 2020 at 09:53):

@Enrico Tassi : as I said I also thought about this, but rejected it because it would require a lot of additional -3 and -4 packages, say for CoqInterval. My conclusion is that it would be better to have a separate switch - say an option in the Coq Platform like Flocq3 with full set of packages or Flocq4 with reduced set of packages. But it is more a 70%/30% pro/con thing.

view this post on Zulip Michael Soegtrop (Dec 30 2020 at 10:00):

But I guess it would make sense to have an opam package flocq.dev.3 or flocq.3.dev. Which one?

view this post on Zulip Enrico Tassi (Dec 30 2020 at 10:05):

I was talking about Coq's CI, not the platform.

It is customary to have .$branch.dev so I'd pick the latter (in Coq it's v8.X.dev IIRC)

view this post on Zulip Enrico Tassi (Dec 30 2020 at 10:06):

Coq's ci (for the good and the bad) is not based on opam, so a job is a bunch of commands to fetch and build a package. I'd pick only one version of each dependency under CI, moving them on top of the flocq-4 once all ready.

view this post on Zulip Théo Zimmermann (Dec 30 2020 at 10:26):

Let's test only flocq-3 for now. Testing two branches would be a bit inconvenient for overlay creation.

view this post on Zulip Théo Zimmermann (Dec 30 2020 at 10:28):

@Guillaume Melquiond If you could give us a heads-up when you switch CoqInterval master to requiring Flocq 4, then we could introduce the two jobs at that point, or switch CoqInterval to a stable branch if you maintain one at that point.


Last updated: Oct 16 2021 at 07:02 UTC