Stream: Coq Platform devs & users

Topic: Inria gitlab commit hash tar file changed its hash


view this post on Zulip Michael Soegtrop (Jan 30 2021 at 10:33):

I have a strange effect. The following file:

https://gitlab.inria.fr/gappa/gappa/-/archive/f53e105cd73484fc76eb58ba24ead73be502c608.tar.gz

seems to have changed its hash code. When I download it with wget and run openssl dgst -sha512 on the download I used to get

b2c04d87b502fcab24573b6030e1ba3e2bd9cc7ae719367d12785e8bd91e43001f8d64e70d5ae515ddd9ec636d9c1ce89b54b18ae0796955b5c97b71fee5c957

but now I get

22b82333d0e135578843dcb0740a68a364a21c24dc061e9645ace2353fb8a9141722e5b4691d32d6c433064e3a393ee7ee44435ecd3de936ecb4901498f539de

I am quite sure about this because I get CI errors from this (the old hash is in the opam package for gappa) and this used to work. E.g. the opam package with the old hash did pass opam's CI which I think is not possible when the hash is wrong.

If someone has a copy of the file f53e105cd73484fc76eb58ba24ead73be502c608.tar.gz (running the coq platform v8.13 script should download it but might delete it during a cleanup) with the old hash code (starting with b2c) around, please send it to me. I would like to understand how this could happen.

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 10:54):

It might just be due to an update of git on the server. For example, up to a few months ago, git archive was incorrectly parsing compression level on the command line (e.g., -13 was parsed as -3).

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 11:05):

Yes, I though about this, but then if something like this could happen, it should have shown up in opam before since most opam packages reference tar.gz files with a sha hash sum. So I would really like to analyze the root cause. It might also be that I did a mistake, but then I don't understand how this could have slipped all the layers of CI we have (opam, Coq Platform, ...).

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 11:25):

I just grepped the whole released archive, and there are only 4 versions (among 1750 ones) which are downloading an archive pointed by a sha sum. And they were all uploaded by you. So, "most opam packages" is actually a very subjective notion here.

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 13:25):

I guess we are not talking about the same thing here. Pretty much all of the release coq opam files (I count 1376 out of 1411) refer to a tar.gz file and use a checksum (either md5, sha256 or sha512). What I am talking about is that this checksum changes although the .tar.gz URL remained the same.

I guess you are refering to URLs beeing themselves commit hash references. I agree that this is bad practice - I sometimes do this when I need a patch on top of a tag or maybe in error - but this doesn't have anything to do with the problem at hand.

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 14:01):

It has everything to do with the problem at hand. You are complaining that https://gitlab.inria.fr/gappa/gappa/-/archive/f53e105cd73484fc76eb58ba24ead73be502c608.tar.gz, which is a file referred by a commit hash, has a different checksum than before. This file is generated on the fly by the server, so it depends on the version of gitlab, git archive, zlib, tar, and a few other packages. So, you cannot expect its checksum to stay the same over time, even if the files it contains have not changed.

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 14:44):

So your assumption is that if I use a tag instead of a hash the file is not created on the fly? Say

https://github.com/math-comp/math-comp/archive/mathcomp-1.12.0.tar.gz

Do you have some evidence for this assumption? It is imaginable that github stores teh archives for tags, but I don't find it very plausible that there is a difference between using a tag and a commit hash.

I understand that there are some references to apparently static tar files, e.g.

https://gforge.inria.fr/frs/download.php/file/38383/coquelicot-3.2.0.tar.gz

but I would say this is a minortiy. Most opam packages (in Coq released and opam main) refer to git tar files for tags.

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 14:58):

My assumption is that archives for tags are cached, and archives for random commits are not (or with a very short lifetime).

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 15:03):

This is possible, but I still don't buy this. E.g. it is a good question if such caches would survive a server reconfiguration which changes the cache contents.

Anyway I will review all packages with hashes if they can be replaced with tags.

view this post on Zulip Paolo Giarrusso (Jan 30 2021 at 15:35):

Re "majority of packages”, is it valid to use GitHub release tarballs in opam?

view this post on Zulip Paolo Giarrusso (Jan 30 2021 at 15:37):

I suppose at least GitHub releases are tested to preserve their hash at the byte level, since pretty much the entire Internet relies on that, but I’d never wondered.

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 16:50):

Hard to tell if releases, tags and commits are handled differently. E.g. the URLs of releases is not different than the URL of tags. Maybe github does something special like storing release tar balls, maybe they have CI to test that the hashes stay constant, maybe it is just luck that nothing happened so far.

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 16:57):

Paolo Giarrusso said:

I suppose at least GitHub releases are tested to preserve their hash at the byte level, since pretty much the entire Internet relies on that, but I’d never wondered.

Is that so? I would have thought otherwise. For instance, as far as I know, Npm.js does not bother with the checksums of Github releases, since the source files are not coming from there anyway. Same for Debian, Fedora, and so on. Even for Opam, the source files are now hosted on Opam's servers. (So, you are actually checking that the server is sending you the original file, not that the current one still has the same checksum.)

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 17:03):

@Guillaume Melquiond : Unfortunately for Windows compatibility reasons I am bound to opam 2.0.7 with the platform, which seems to download the original sources. Do you know in which version the caching you mentioned was introduced?

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 17:44):

It has been there for a very long time. (Note that I am talking about the official Opam server, not the one we use for Coq. We have not enabled the cache for it.) For example, the file you were talking about should be available as https://opam.ocaml.org/cache/sha512/b2/b2c04d87b502fcab24573b6030e1ba3e2bd9cc7ae719367d12785e8bd91e43001f8d64e70d5ae515ddd9ec636d9c1ce89b54b18ae0796955b5c97b71fee5c957 or something akin to it.

view this post on Zulip Michael Soegtrop (Jan 30 2021 at 18:18):

@Guillaume Melquiond : cool thanks!

The difference between the old and new tar.gz file for gappa is that the old one does contain .gitignore files, the new one doesn't. Otherwise they are content wise identical - hard to tell if e.g. compression is different. The two URLs are:

new: https://gitlab.inria.fr/gappa/gappa/-/archive/f53e105cd73484fc76eb58ba24ead73be502c608.tar.gz
old: https://opam.ocaml.org/cache/sha512/b2/b2c04d87b502fcab24573b6030e1ba3e2bd9cc7ae719367d12785e8bd91e43001f8d64e70d5ae515ddd9ec636d9c1ce89b54b18ae0796955b5c97b71fee5c957

Any idea what might have caused this? As you said a change in the INRIA gitlab server config?

view this post on Zulip Guillaume Melquiond (Jan 30 2021 at 19:39):

Not sure. I added .gitignore to .gitattributes recently. Theoretically, git archive is supposed to read the .gitattributes file from the given commit. But, who knows what Gitlab does? Perhaps it uses the file from the default branch if the given commit does not provide it. Or perhaps it is using info/attribute instead of .gitattributes, which would certainly make things simpler for bare repositories.

view this post on Zulip Michael Soegtrop (Jan 31 2021 at 09:12):

I see - interesting. Well at least we have an explanation for why this happened to gappa and no other repo, although as you say one wouldn't expect that changing .gitattributes would affect previous commits. Do you think we should file a bug report to gitlab or the INRIA gitlab maintainers?

P.S.: I swapped the does / doesn't contain and had the wrong old URL in my previous post - I edited to be correct.


Last updated: Jan 30 2023 at 10:03 UTC