Stream: Coq devs & plugin devs

Topic: GitHub tarball regeneration


view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:39):

it unfortunately looks like GitHub has regenerated every tarball on GitHub and given them a new checksum. So all tarballs in opam archives now have the wrong checksum.

view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:40):

example: https://github.com/coq/coq/archive/refs/tags/V8.16.1.tar.gz now has sha256 checksum 69cfad0e2faa202d1ad6db86b576086ce7dfea6e4e123d94c689269cf43f5606 instead of what is in the opam archive (583471c8ed4f227cb374ee8a13a769c46579313d407db67a82d202ee48300e4b).

view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:41):

this can rapidly become a huge problem in all automation that uses stable packages

view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:43):

I have no idea how to tackle this problem efficiently, but we are at least not alone in this predicament

view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:51):

sigh: https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/

GitHub doesn't guarantee the stability of checksums for automatically generated archives. If you need to rely on a consistent checksum, you may upload archives directly to GitHub Releases.

view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:53):

so now we just got two pretty significant problems/tasks: (1) update every autogen archive checksum, (2) upload stable archive for at least every platform package...

view this post on Zulip Karl Palmskog (Jan 30 2023 at 21:56):

@Théo Zimmermann can you please add this (uploading stable archive to GitHub) to the RM tasklist? Regardless of whether GitHub rolls this back (they broke a lot of people's CI and other stuff), I think it would be good practice for Coq.

view this post on Zulip Karl Palmskog (Jan 30 2023 at 22:00):

@Michael Soegtrop please be prepared for a lot of Platform CI breakage due to GitHub checksum changes

view this post on Zulip Karl Palmskog (Jan 30 2023 at 22:10):

opened this for more public tracking: https://github.com/coq/opam-coq-archive/issues/2458

view this post on Zulip Karl Palmskog (Jan 30 2023 at 22:13):

some good news: out of Coq's dependencies, only zarith is hosted as a GitHub tarball

view this post on Zulip Karl Palmskog (Jan 30 2023 at 22:24):

posting the workaround here as well:

opam install <package> --no-checksums

view this post on Zulip Karl Palmskog (Jan 30 2023 at 22:35):

apparently they are reverting: https://github.com/bazel-contrib/SIG-rules-authors/issues/11#issuecomment-1409438954

Hey folks. I'm the product manager for Git at GitHub. We're sorry for the breakage, we're reverting the change, and we'll communicate better about such changes in the future (including timelines).

view this post on Zulip Karl Palmskog (Jan 31 2023 at 08:52):

they reverted the tarballs (which prevented the imminent disaster). But I'm leaving the issue open under a modified header to track discussion of how we want to handle stable archiving for Coq and Coq packages.

view this post on Zulip Enrico Tassi (Jan 31 2023 at 09:17):

Thanks god. It would be nice if someone wrote a script to "copy" the automatically generated tarball into the assets...

view this post on Zulip Karl Palmskog (Jan 31 2023 at 09:19):

I think dune-release already does this. I know that @Emilio Jesús Gallego Arias always uses non-generated tarballs for his packages.

view this post on Zulip Théo Zimmermann (Jan 31 2023 at 09:33):

dune-release doesn't just copy them. It generates them with a different file extension and it does some variable substitution before.

view this post on Zulip Théo Zimmermann (Jan 31 2023 at 09:33):

Let's wait to see what further announcements GitHub makes on this topic, but I'm also interested in hearing how far we are from releasing Coq with dune-release.

view this post on Zulip Enrico Tassi (Jan 31 2023 at 09:35):

I see the future is bright, but there are many "releases" out there. The script could be used to "mass fix" them and make them future proof.

view this post on Zulip Emilio Jesús Gallego Arias (Jan 31 2023 at 09:43):

Actually what Nix does (using a hash which is independent of the packaging compression / method) is the right thing to do IMO.

tar is famously non-deterministic

view this post on Zulip Enrico Tassi (Jan 31 2023 at 09:46):

I agree, but it seem a bit too late to fix opam (and the rest of the world as well). Indeed they did revert the change, and it is surely not because of us (alone) complaining

view this post on Zulip Emilio Jesús Gallego Arias (Jan 31 2023 at 09:48):

Opam could be fixed in the next release I think, just adding another method to the hashing

view this post on Zulip Emilio Jesús Gallego Arias (Jan 31 2023 at 09:48):

Because I understand that this change has just been delayed by Github.

view this post on Zulip Karl Palmskog (Jan 31 2023 at 09:48):

this also ties into our tentative goal of signing Coq and ecosystem packages (which to me means: signing both package metadata and the actual archive like the tarball)

view this post on Zulip Emilio Jesús Gallego Arias (Jan 31 2023 at 09:49):

@Karl Palmskog can't people just sign the hash?

view this post on Zulip Emilio Jesús Gallego Arias (Jan 31 2023 at 09:49):

the problem is the hash being stable under compression

view this post on Zulip Emilio Jesús Gallego Arias (Jan 31 2023 at 09:49):

isn't it?

view this post on Zulip Karl Palmskog (Jan 31 2023 at 09:50):

@Emilio Jesús Gallego Arias did you see stuff like: https://hannes.nqsb.io/Posts/Conex

view this post on Zulip Karl Palmskog (Jan 31 2023 at 09:51):

I don't really care if the hash before compression is signed, or the hash after compression, as long as we can ensure stability

view this post on Zulip Théo Zimmermann (Jan 31 2023 at 10:02):

Emilio Jesús Gallego Arias said:

Actually what Nix does (using a hash which is independent of the packaging compression / method) is the right thing to do IMO.

tar is famously non-deterministic

Actually, Nix does this only for part of the sources in Nixpkgs. I'm pretty sure there is a thread somewhere about this change and its impact to Nixpkgs (but I haven't looked for it).

view this post on Zulip Michael Soegtrop (Jan 31 2023 at 10:13):

Please note that things like this happened in the past mostly on self hosted gitlab, where people changed configurations. I also don't think that there are any guarantees by tar and gzip that they will produce bit identical output in a new version. So as far as I understand this dynamically generated tar balls can change their hash any time.

view this post on Zulip Michael Soegtrop (Jan 31 2023 at 10:15):

it might make sense to change to a source code based hashing, similar to what GIT is doing. Say create a text file with each file name and its hash code and then hash this text file.

view this post on Zulip Karl Palmskog (Jan 31 2023 at 10:52):

it appears that some people (e.g., here) got a commitment from GitHub previously that the /refs/tags tarballs were stable. But this was contradicted by yesterday's events, so I don't know what to believe.

view this post on Zulip Michael Soegtrop (Jan 31 2023 at 14:26):

@Karl Palmskog : do all opam packages observe the tar ball url restrictions given in the link?

view this post on Zulip Karl Palmskog (Jan 31 2023 at 14:34):

@Michael Soegtrop do you mean, do all opam packages use the ref/tags thing, like https://github.com/coq-community/coqeal/archive/refs/tags/1.0.6.tar.gz? Then no, the majority of Coq opam packages use urls like https://github.com/coq-community/huffman/archive/v8.15.0.tar.gz

view this post on Zulip Karl Palmskog (Jan 31 2023 at 14:35):

but we could change to refs/tags links if GitHub actually demonstrate somehow that they are going to be stable...

view this post on Zulip Michael Soegtrop (Jan 31 2023 at 16:01):

Well at least this is what they claim in the link you shared.

view this post on Zulip Karl Palmskog (Jan 31 2023 at 16:09):

they claimed it in the past, but the current Git-on-GitHub product manager did not want make these claims about tarball stability right now. But he may do in the near future. (link to PM comments)

view this post on Zulip Michael Soegtrop (Feb 01 2023 at 08:27):

@Karl Palmskog : so what is the conclusion: we continue with "fingers crossed" or we suggest to opam that they should compute hashes on the extracted sources rather than on the tar ball or we do something more fundamental?

view this post on Zulip Karl Palmskog (Feb 01 2023 at 08:43):

the website blog post now says more information is forthcoming: https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/

view this post on Zulip Karl Palmskog (Feb 01 2023 at 08:44):

... so I suggest we wait with any further action until GitHub says something more

view this post on Zulip Karl Palmskog (Feb 24 2023 at 18:32):

followup to this: https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes/

sounds like it's a good idea to put all old Coq generated tarballs as separate release files (even though the generated tarballs are guaranteed for one more year)

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 12:53):

@Karl Palmskog : thanks for sharing this useful statement from GitHub. Is there consensus how we proceed? I see two options:

I would actually prefer the latter approach and the timeline given by GitHub should be sufficient to do this in opam.

view this post on Zulip Gaëtan Gilbert (Feb 27 2023 at 13:03):

how do you even hash a tree? The first way seems much easier to use

view this post on Zulip Enrico Tassi (Feb 27 2023 at 13:08):

Well, the obvious way to hash a git tree is to use its tag hash.
Also nix does it, so there must be a decent way also for non git trees (I mean, a refinement of cat $(find . | sort) | sha256sum should do it)

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 13:56):

@Gaëtan Gilbert : the basic idea e.g. used by git is to hash all files in a folder and create a text file with file names and hashes (sorted by file name) and hash this text file to give the hash of the folder. One does this recursively then.

view this post on Zulip Paolo Giarrusso (Feb 27 2023 at 13:59):

one could just use git checkouts with tags or even commit IDs... are tarballs better for users, and if yes, is that an opam issue or fundamental?

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:03):

The issue I see with tar and gzip is that neither of them is specified to produce binary identical results. The tar ball hash method only works because there was no substantial change in tar or gzip for 2 decades or so.

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:04):

@Paolo Giarrusso : the main issue I see with git hashes is that they are 128 bit (afair). They are intended to avoid collisions, but I don't think one can consider them cryptographically adequate these days.

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:08):

@Gaëtan Gilbert : see e.g. the section "Tree Objects" in Git Internals - Git Objects

view this post on Zulip Paolo Giarrusso (Feb 27 2023 at 14:09):

160 bits but yes, and git's migration to sha256 seems stalled https://lwn.net/Articles/898522/

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:13):

I would also think that for opam's purposes it would be simpler to create a single file for the whole tree rather than hierarchical hashes. For debugging one could then easily create this file and diff it.

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:14):

I e.g. have seen in the past deviations in tar gz hashes where the difference was in the inclusion of files like .gitignore.

view this post on Zulip Gaëtan Gilbert (Feb 27 2023 at 14:17):

what will the instructions be for the coq dev who wants to add an opam package?
currently it's "get your tarball and run sha256sum on it"
with git it's something like checkout your git commit then "git cat-file -p HEAD | grep ^tree | cut -d ' ' -f 2"??

view this post on Zulip Karl Palmskog (Feb 27 2023 at 14:19):

please also see discussion in https://github.com/coq/opam-coq-archive/issues/2458

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:23):

@Gaëtan Gilbert : there is opam admin add-hashes. I guess one can change this such that it works for a subdirectory.

view this post on Zulip Michael Soegtrop (Feb 27 2023 at 14:24):

(currently it only works for complete repos afaik).

view this post on Zulip Emilio Jesús Gallego Arias (Feb 27 2023 at 15:59):

dune-release does 1 automatically, that will be an option for Coq packages soon.

view this post on Zulip Lasse Blaauwbroek (Feb 28 2023 at 01:38):

Note that there are nontrivial security risks associated with hashing decompressed archives, such as a zip-bomb: https://en.wikipedia.org/wiki/Zip_bomb

view this post on Zulip Karl Palmskog (Feb 28 2023 at 06:38):

the OCaml community is in the same situation as us, I think our default position would be to follow their lead, e.g., if they change opam's hashing options, we go along

view this post on Zulip Michael Soegtrop (Feb 28 2023 at 08:37):

I would think that for a meta build system like opam, risks like zip bombs are not near the top of the list. Besides we almost always use https sources for the zip files.

view this post on Zulip Karl Palmskog (Feb 28 2023 at 08:41):

unfortunately, GitHub projects can get taken over, allowing someone to change archives without https being part of the equation. But I agree things like zip bombs are not a big concern for us as a small community with few industry-critical projects

view this post on Zulip Guillaume Melquiond (Feb 28 2023 at 09:22):

Also, if needs be, we can always decide to reject .zip source archive and require .tar.gz instead.

view this post on Zulip Paolo Giarrusso (Feb 28 2023 at 10:46):

Not an expert but gzip bombs seem to exist? https://www.rapid7.com/db/modules/auxiliary/dos/http/gzip_bomb_dos/

view this post on Zulip Michael Soegtrop (Feb 28 2023 at 10:54):

Maybe. Still threads of outages due to tar ball hash issues are IMHO substantially more likely than outages due to zip bombs. At least for Coq Platform I will likely see it in CI before a user experiences it. For the tar ball hash change issue, I have seen 3 cases in the past (2x reconfiguration of Gitlab servers from INRA and MPI SWS and once the GitHub issue discussed here).

view this post on Zulip Guillaume Melquiond (Feb 28 2023 at 12:32):

Paolo Giarrusso said:

Not an expert but gzip bombs seem to exist? https://www.rapid7.com/db/modules/auxiliary/dos/http/gzip_bomb_dos/

That is not really a bomb. The format theoretically allows for a x1000 compression ratio, so 10MB to 10GB is just a plain optimal gzip file. (Zip bombs allow for x10^6 or larger expansions, much larger than the theoretical optimum of Deflate.)

view this post on Zulip Guillaume Melquiond (Feb 28 2023 at 12:33):

The current (?) record for a zip bomb is 46MB to 4.6PB, so x10^8 expansion.

view this post on Zulip Lasse Blaauwbroek (Mar 01 2023 at 06:41):

For the interested: It is even possible to build a zip-quine: https://wgreenberg.github.io/quine.zip/


Last updated: Mar 29 2024 at 14:01 UTC