it unfortunately looks like GitHub has regenerated every tarball on GitHub and given them a new checksum. So all tarballs in opam archives now have the wrong checksum.
example: https://github.com/coq/coq/archive/refs/tags/V8.16.1.tar.gz now has sha256 checksum 69cfad0e2faa202d1ad6db86b576086ce7dfea6e4e123d94c689269cf43f5606
instead of what is in the opam archive (583471c8ed4f227cb374ee8a13a769c46579313d407db67a82d202ee48300e4b
).
this can rapidly become a huge problem in all automation that uses stable packages
I have no idea how to tackle this problem efficiently, but we are at least not alone in this predicament
sigh: https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/
GitHub doesn't guarantee the stability of checksums for automatically generated archives. If you need to rely on a consistent checksum, you may upload archives directly to GitHub Releases.
so now we just got two pretty significant problems/tasks: (1) update every autogen archive checksum, (2) upload stable archive for at least every platform package...
@Théo Zimmermann can you please add this (uploading stable archive to GitHub) to the RM tasklist? Regardless of whether GitHub rolls this back (they broke a lot of people's CI and other stuff), I think it would be good practice for Coq.
@Michael Soegtrop please be prepared for a lot of Platform CI breakage due to GitHub checksum changes
opened this for more public tracking: https://github.com/coq/opam-coq-archive/issues/2458
some good news: out of Coq's dependencies, only zarith
is hosted as a GitHub tarball
posting the workaround here as well:
opam install <package> --no-checksums
apparently they are reverting: https://github.com/bazel-contrib/SIG-rules-authors/issues/11#issuecomment-1409438954
Hey folks. I'm the product manager for Git at GitHub. We're sorry for the breakage, we're reverting the change, and we'll communicate better about such changes in the future (including timelines).
they reverted the tarballs (which prevented the imminent disaster). But I'm leaving the issue open under a modified header to track discussion of how we want to handle stable archiving for Coq and Coq packages.
Thanks god. It would be nice if someone wrote a script to "copy" the automatically generated tarball into the assets...
I think dune-release
already does this. I know that @Emilio Jesús Gallego Arias always uses non-generated tarballs for his packages.
dune-release
doesn't just copy them. It generates them with a different file extension and it does some variable substitution before.
Let's wait to see what further announcements GitHub makes on this topic, but I'm also interested in hearing how far we are from releasing Coq with dune-release
.
I see the future is bright, but there are many "releases" out there. The script could be used to "mass fix" them and make them future proof.
Actually what Nix does (using a hash which is independent of the packaging compression / method) is the right thing to do IMO.
tar
is famously non-deterministic
I agree, but it seem a bit too late to fix opam (and the rest of the world as well). Indeed they did revert the change, and it is surely not because of us (alone) complaining
Opam could be fixed in the next release I think, just adding another method to the hashing
Because I understand that this change has just been delayed by Github.
this also ties into our tentative goal of signing Coq and ecosystem packages (which to me means: signing both package metadata and the actual archive like the tarball)
@Karl Palmskog can't people just sign the hash?
the problem is the hash being stable under compression
isn't it?
@Emilio Jesús Gallego Arias did you see stuff like: https://hannes.nqsb.io/Posts/Conex
I don't really care if the hash before compression is signed, or the hash after compression, as long as we can ensure stability
Emilio Jesús Gallego Arias said:
Actually what Nix does (using a hash which is independent of the packaging compression / method) is the right thing to do IMO.
tar
is famously non-deterministic
Actually, Nix does this only for part of the sources in Nixpkgs. I'm pretty sure there is a thread somewhere about this change and its impact to Nixpkgs (but I haven't looked for it).
Please note that things like this happened in the past mostly on self hosted gitlab, where people changed configurations. I also don't think that there are any guarantees by tar and gzip that they will produce bit identical output in a new version. So as far as I understand this dynamically generated tar balls can change their hash any time.
it might make sense to change to a source code based hashing, similar to what GIT is doing. Say create a text file with each file name and its hash code and then hash this text file.
it appears that some people (e.g., here) got a commitment from GitHub previously that the /refs/tags
tarballs were stable. But this was contradicted by yesterday's events, so I don't know what to believe.
@Karl Palmskog : do all opam packages observe the tar ball url restrictions given in the link?
@Michael Soegtrop do you mean, do all opam packages use the ref/tags
thing, like https://github.com/coq-community/coqeal/archive/refs/tags/1.0.6.tar.gz
? Then no, the majority of Coq opam packages use urls like https://github.com/coq-community/huffman/archive/v8.15.0.tar.gz
but we could change to refs/tags
links if GitHub actually demonstrate somehow that they are going to be stable...
Well at least this is what they claim in the link you shared.
they claimed it in the past, but the current Git-on-GitHub product manager did not want make these claims about tarball stability right now. But he may do in the near future. (link to PM comments)
@Karl Palmskog : so what is the conclusion: we continue with "fingers crossed" or we suggest to opam that they should compute hashes on the extracted sources rather than on the tar ball or we do something more fundamental?
the website blog post now says more information is forthcoming: https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/
... so I suggest we wait with any further action until GitHub says something more
followup to this: https://github.blog/2023-02-21-update-on-the-future-stability-of-source-code-archives-and-hashes/
sounds like it's a good idea to put all old Coq generated tarballs as separate release files (even though the generated tarballs are guaranteed for one more year)
@Karl Palmskog : thanks for sharing this useful statement from GitHub. Is there consensus how we proceed? I see two options:
I would actually prefer the latter approach and the timeline given by GitHub should be sufficient to do this in opam.
how do you even hash a tree? The first way seems much easier to use
Well, the obvious way to hash a git tree is to use its tag hash.
Also nix does it, so there must be a decent way also for non git trees (I mean, a refinement of cat $(find . | sort) | sha256sum
should do it)
@Gaëtan Gilbert : the basic idea e.g. used by git is to hash all files in a folder and create a text file with file names and hashes (sorted by file name) and hash this text file to give the hash of the folder. One does this recursively then.
one could just use git
checkouts with tags or even commit IDs... are tarballs better for users, and if yes, is that an opam
issue or fundamental?
The issue I see with tar and gzip is that neither of them is specified to produce binary identical results. The tar ball hash method only works because there was no substantial change in tar or gzip for 2 decades or so.
@Paolo Giarrusso : the main issue I see with git hashes is that they are 128 bit (afair). They are intended to avoid collisions, but I don't think one can consider them cryptographically adequate these days.
@Gaëtan Gilbert : see e.g. the section "Tree Objects" in Git Internals - Git Objects
160 bits but yes, and git's migration to sha256 seems stalled https://lwn.net/Articles/898522/
I would also think that for opam's purposes it would be simpler to create a single file for the whole tree rather than hierarchical hashes. For debugging one could then easily create this file and diff it.
I e.g. have seen in the past deviations in tar gz hashes where the difference was in the inclusion of files like .gitignore.
what will the instructions be for the coq dev who wants to add an opam package?
currently it's "get your tarball and run sha256sum on it"
with git it's something like checkout your git commit then "git cat-file -p HEAD | grep ^tree | cut -d ' ' -f 2"
??
please also see discussion in https://github.com/coq/opam-coq-archive/issues/2458
@Gaëtan Gilbert : there is opam admin add-hashes
. I guess one can change this such that it works for a subdirectory.
(currently it only works for complete repos afaik).
dune-release
does 1 automatically, that will be an option for Coq packages soon.
Note that there are nontrivial security risks associated with hashing decompressed archives, such as a zip-bomb: https://en.wikipedia.org/wiki/Zip_bomb
the OCaml community is in the same situation as us, I think our default position would be to follow their lead, e.g., if they change opam's hashing options, we go along
I would think that for a meta build system like opam, risks like zip bombs are not near the top of the list. Besides we almost always use https sources for the zip files.
unfortunately, GitHub projects can get taken over, allowing someone to change archives without https being part of the equation. But I agree things like zip bombs are not a big concern for us as a small community with few industry-critical projects
Also, if needs be, we can always decide to reject .zip
source archive and require .tar.gz
instead.
Not an expert but gzip bombs seem to exist? https://www.rapid7.com/db/modules/auxiliary/dos/http/gzip_bomb_dos/
Maybe. Still threads of outages due to tar ball hash issues are IMHO substantially more likely than outages due to zip bombs. At least for Coq Platform I will likely see it in CI before a user experiences it. For the tar ball hash change issue, I have seen 3 cases in the past (2x reconfiguration of Gitlab servers from INRA and MPI SWS and once the GitHub issue discussed here).
Paolo Giarrusso said:
Not an expert but gzip bombs seem to exist? https://www.rapid7.com/db/modules/auxiliary/dos/http/gzip_bomb_dos/
That is not really a bomb. The format theoretically allows for a x1000 compression ratio, so 10MB to 10GB is just a plain optimal gzip file. (Zip bombs allow for x10^6 or larger expansions, much larger than the theoretical optimum of Deflate.)
The current (?) record for a zip bomb is 46MB to 4.6PB, so x10^8 expansion.
For the interested: It is even possible to build a zip-quine: https://wgreenberg.github.io/quine.zip/
Last updated: Oct 12 2024 at 11:01 UTC