Stream: Coq Platform devs & users

Topic: Snap CI failing


view this post on Zulip Michael Soegtrop (Jan 26 2023 at 08:35):

@Enrico Tassi : since 2 days the snap CI is failing (reproducibly). The issue is that I don't really see an error - the platform build runs through to the end and then it says that snap failed. Can you make sense out of this (see e.g. https://github.com/coq/platform/actions/runs/4011369172/jobs/6888873076) ?

view this post on Zulip Théo Zimmermann (Jan 26 2023 at 08:54):

What about adding --debug to the Snap build command as they suggest?

view this post on Zulip Enrico Tassi (Jan 26 2023 at 09:26):

That is good in interactive mode, since it opens a shell inside the container.

view this post on Zulip Enrico Tassi (Jan 26 2023 at 09:27):

Locally, I never managed to set up LXD, so I use another (soon deprecated) container thing

view this post on Zulip Enrico Tassi (Jan 26 2023 at 09:31):

Anyway, I'll investigate it

view this post on Zulip Michael Soegtrop (Jan 26 2023 at 09:36):

I can also try locally - I have the same Ubuntu version and it should also use LXD (I guess).

view this post on Zulip Michael Soegtrop (Jan 26 2023 at 09:37):

I am just busy this week, so I asked if you have a clue ...

view this post on Zulip Enrico Tassi (Jan 26 2023 at 09:59):

I did a PR which saves snapcraft logs in case of failure. Let's see if they are of any use

view this post on Zulip Michael Soegtrop (Jan 30 2023 at 13:55):

@Enrico Tassi : it is a bit of a miracle to me. With 4GB (default settings) I reproducibly can't compile Coq 8.16.1 in the snapcraft VM. With more memory it works. The thing I don't understand is what changes in GitHub between working / not working. I compared the opam install list and it is identical. Coq doesn't even compile when I run in sequential mode (install each opam package with a separate opam install command). Any thoughts?

view this post on Zulip Enrico Tassi (Jan 30 2023 at 14:00):

I'm sorry but I've no idea

view this post on Zulip Michael Soegtrop (Feb 01 2023 at 10:36):

@Enrico Tassi : the only difference I can see between passing and failing builds is a line lxd (5.0/stable) 5.0.2-838e1b2 from Canonical** refreshed in the failing builds. The snap store says that LXD was last updated January 19th, but our CI stopped working January 24th. Can it be that there is such a delay?

view this post on Zulip Michael Soegtrop (Feb 01 2023 at 11:01):

Ah, I found this in the snapstore release table:

5.0/stable  5.0.2-838e1b2   25 January 2023

view this post on Zulip Michael Soegtrop (Feb 01 2023 at 11:02):

And the last working one was Jan 24th, the first failing one was Jan 25th.

view this post on Zulip Michael Soegtrop (Feb 01 2023 at 11:03):

Possibly we should try a later version (there is a 5.10 track) or the previous release of the 5.0 track.

view this post on Zulip Michael Soegtrop (Feb 01 2023 at 11:04):

Not sure how I would do this, though?

view this post on Zulip Enrico Tassi (Feb 01 2023 at 16:34):

Unfortunately it looks hardwired:
https://github.com/snapcore/action-build/blob/3457752ec9b1c79a8290b5167fce2d14df0997c1/src/tools.ts#L75-L89

view this post on Zulip Michael Soegtrop (Feb 03 2023 at 18:37):

@Enrico Tassi : Actually I can change the lxd version by installing the latest version of a different channel beforehand. The refresh then says it is already up to date. I now tried 5.10 / latest instead of 5.0. The problem is that it has the same issue.

It looks pretty severe btw. It seems to forget which user it is. The github runner user has root rights, so opam usually gives a "WARNING running as root is not recommended" on every command. With the broken LXDs it gives this warning for a while, but then it stops to give this warning, which means the user incarnation magically changed in the middle of the script.

I see if I can get some interested from the LXD maintainers for this, but I guess it will take a longer time to fix.

Do you know if I can install an older version, which is no longer advertised on the snap store?

view this post on Zulip Enrico Tassi (Feb 03 2023 at 20:15):

I don't thinks so. But now that you have pinpointed the problem you can report it upstream, or google for it. It is odd we are the only ones affected.

view this post on Zulip Michael Soegtrop (Feb 04 2023 at 09:56):

The problem with debugging this for the LXD team will be that opam runs for a few hours before LXD falls apart. But at least it is reproducible.

view this post on Zulip Michael Soegtrop (Feb 04 2023 at 09:58):

What I am trying to understand before reporting to LXD is if the error 120 comes from opam, and if so what it means. Opam seems to have custom error messages in the 12X range. I filed an issue, but no response as yet.


Last updated: Dec 07 2023 at 09:01 UTC