Stream: Coq Platform devs & users

Topic: Platform use for research artifacts


view this post on Zulip Emilio Jesús Gallego Arias (Sep 11 2021 at 11:01):

Actually not sure the current platform is suited yet to build research artifacts on top, but that's a different discussion

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:02):

the current "standard" for Coq artifacts is that it's an ad-hoc mess of tarballs, containers, scripting, VM images, ... seems to me that the platform is a huge improvement on that, even though it may not have reached full industrial maturity yet

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:05):

one example I remember is when some submission provided a VM image with all .vo files compiled and no obvious way to recheck or run coqchk. I guess they expected me to traverse the vo files manually and certify the proofs that way

view this post on Zulip Bas Spitters (Sep 11 2021 at 11:10):

My experience from the POPL PC is that the quality of Coq artifacts can be improved. @Karl Palmskog are you in contact with the POPL artifact evaluation committee? They should be able to give you some data.

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:16):

@Bas Spitters I haven't been hooked into the POPL community for a while, no. The ideal situation would be that they mandate that if you use Coq 8.13 or 8.14, you have to be compatible with the latest Coq platform for those versions (this is probably the minimum for me to want to review such an artifact these days). But I'm not really interested in coming in as an outsider and trying to push for that.

view this post on Zulip Bas Spitters (Sep 11 2021 at 11:26):

We ask the input of some regulars here, people like @Alix Trieu @Gabriel Scherer @Tej Chajed
https://popl21.sigplan.org/committee/POPL-2021-artifact-evaluation-artifact-evaluation-committee

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:28):

there is also a platform issue where one should probably summarize any conclusions: https://github.com/coq/platform/issues/2

for example, Tej reported there before:

[I]n POPL 2020, 17 out of the 40 artifacts used Coq.

view this post on Zulip Tej Chajed (Sep 11 2021 at 11:36):

I gathered statistics on mechanization for POPL 2020 artifacts by making a spreadsheet of all the artifacts (which included if they were mechanized and in which proof assistant if so), that's where the 17 out of 40 number comes from. I was planning on re-doing that study for POPL 2021 (and POPL 2022 when artifacts are submitted). Is the followup question then out of the Coq artifacts, how were they packaged?

view this post on Zulip Bas Spitters (Sep 11 2021 at 11:38):

@Tej Chajed Yes, how were they packaged, and would coq platform have improved the packaging ...

view this post on Zulip Bas Spitters (Sep 11 2021 at 11:40):

I see I was off by a year :face_palm: , but that you are in both committees...
https://popl22.sigplan.org/track/POPL-2022-artifact-evaluation

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:41):

by the way, to me at least, using a custom container or VM image is a trust issue. I don't know how if it has been fiddled with or not, and it's a huge hassle to get reassurance of the latter. Using approved pre-built containers would in itself be a step forward.

view this post on Zulip Tej Chajed (Sep 11 2021 at 11:42):

I don't fully understand how Coq platform packaging would look. Don't you still need a Makefile, presumably wrapping coq_makefile? And if you have dependencies outside the platform, maybe you would have opam dependencies (ideally with fixed version numbers for reproducibility)?

view this post on Zulip Tej Chajed (Sep 11 2021 at 11:43):

(I'm not currently using the Coq platform so this is just my ignorance)

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:44):

the final distribution/packing of the artifact could be done in many ways, could use coq_makefile or Dune. I think the point is that I should be able to spin up an Ubuntu Snap or a platform Docker image or Win installer and it would work just running some top-level make or dune or even opam install

view this post on Zulip Tej Chajed (Sep 11 2021 at 11:57):

I see, that makes sense. Currently the recommendation is to submit a VM for all artifacts, but in practice not all artifacts even submit that.

view this post on Zulip Tej Chajed (Sep 11 2021 at 11:58):

If we want people to go a step further and support Docker I think the instructions need to be a lot more explicit so it's easy for both authors and reviewers. The Proof Artifacts guidelines (https://proofartifacts.github.io/guidelines/) are probably a good place to add some technical details on using containers.

view this post on Zulip Karl Palmskog (Sep 11 2021 at 11:59):

as per above, when I was in the POPL AEC, I felt I had the least confidence when people gave a full VM. If I could reproduce the artifact in my own opam-based environment, then almost all environment manipulation concerns are gone.

view this post on Zulip Karl Palmskog (Sep 11 2021 at 12:01):

one idea for (platform-based) containers could be to follow GitHub's Action approach, e.g., the artifact contents are put in a known directory inside the pre-designated third-party container (checkout action), then a given command is run

view this post on Zulip Tej Chajed (Sep 11 2021 at 12:28):

I like the idea of using a standard Docker image. It's nice to not have the artifact clutter your machine's setup (or worry that it might do so), while also being much more transparent and readable as to what environment you're using.

view this post on Zulip Tej Chajed (Sep 11 2021 at 12:30):

If people would just use the docker Coq images in CI that would already help a lot; it's not literally a list of commands I can run locally but it's a form of checked documentation that inspires confidence. Certainly if you're already doing that getting to Docker instructions that work should be easy.

view this post on Zulip Gabriel Scherer (Sep 11 2021 at 13:09):

I was invited to give some input, so here it is. I think it's important to keep in mind that Artifact Evaluation is a social process, whose aims seems to be trying to gently lift our community to gradually better artifacts without any friction. (This is reverse-engineered from my experience as an evaluator and chair.) In this environment, "perfect is the enemy of good". If your requirements are too strict, people will be unhappy and give up (and some will complain loudly), and ultimately the process will tolerate them and accept lowering the norms.

view this post on Zulip Gabriel Scherer (Sep 11 2021 at 13:12):

Authors typically find out about the Artifact Evaluation (AE) requirements when they get invited to submit their artifacts. That's after the paper has been accepted, and they have about a week to prepare the artifact, at the same time as they have to work on the camera-ready version. By that time, it's too late to ask for much. Even asking for a specific version of Coq may be difficult -- considering the effort to port a sizeable development.

view this post on Zulip Karl Palmskog (Sep 11 2021 at 13:14):

the fewer requirements you have, the more "friction" the artifact evaluators will face. Clearly, there is something close to a zero-sum game (and I say that as having both submitted and reviewed artifacts for several years)

view this post on Zulip Gabriel Scherer (Sep 11 2021 at 13:14):

My recommendation would be that the Coq community writes clear packaging recommendations -- including, possibly, specific guidelines for research artifacts -- and try to advertise them broadly within academia outside the AE process (for example, discuss them with the important groups that are pushing out Coq developments). Then AE people will be able to link these recommendations in their documents, but they would probably not be taken as mandatory rules right away. Improving practices takes time.

view this post on Zulip Gabriel Scherer (Sep 11 2021 at 13:16):

(Another thing to keep in mind is that some of the evaluators will only be barely familiar with Coq, and may not have an up-to-date Coq environment; maybe those enjoy VM images more than source distributions -- I certainly agree that it's important to have clean source packaging, of course.)

view this post on Zulip Tej Chajed (Sep 11 2021 at 13:20):

I am in agreement with Gabriel. Currently I think authors don't quite know what process to follow, so then they submit confusing and inconsistent artifacts, and then reviewers struggle with them but eventually figure it out. A good first step would be some standard practices for packaging Coq developments in general, with the hope that when it comes time to submit an artifact authors have no extra work to do.

view this post on Zulip Gabriel Scherer (Sep 11 2021 at 13:21):

Regarding specifically the question of using the platform for research artifacts: I would try to check whether the larger existing (and actively researched) Coq developments around are Platform-ready. I would hazard the guess that the Iris stuff is rather up-to-date with current best practices, but what about other groups such as MIT, the CertiCoq stuff at Princeton, etc.? (The larger the development, the harder to port to new packaging approaches, and the more advance warning they need.)

view this post on Zulip Karl Palmskog (Sep 11 2021 at 13:34):

the chance for reaching any consensus on Coq packaging in general anytime soon are very close to zero.

view this post on Zulip Tej Chajed (Sep 11 2021 at 13:36):

I don't think there needs to be one way to do it, but basically if you have dependencies they should be easy to install, and we have easy-to-find instructions on how to use either opam or Docker to get a particular version of Coq

view this post on Zulip Tej Chajed (Sep 11 2021 at 13:38):

for example I don't really know how to use Docker, and I don't really understand opam local switches, but if I knew where to find instructions on these and authors started using opam or dune or the Coq platform, then dependency management would become a lot easier

view this post on Zulip Bas Spitters (Sep 11 2021 at 13:43):

Gabriel Scherer said:

Authors typically find out about the Artifact Evaluation (AE) requirements when they get invited to submit their artifacts. That's after the paper has been accepted, and they have about a week to prepare the artifact, at the same time as they have to work on the camera-ready version. By that time, it's too late to ask for much.

Isn't that the process we should change? One submits the artifact with the paper, as is common in many other conferences.

view this post on Zulip Michael Soegtrop (Sep 11 2021 at 15:01):

I think 2021.09 will improve things, because it makes it easy to add custom package list files and still get things maintained with an older Coq version. An older Coq platform might not run any more cause of changes in OS (that's the reason for 2021.02.2). The idea is that the current Coq platform also supports building unchanged older package lists on newer OSes. Then it should be trivial to make a modified package list for some research artifacts. If there are specific applications I would be happy to demonstrate it.

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 07:20):

Bas Spitters said:

Gabriel Scherer said:

Authors typically find out about the Artifact Evaluation (AE) requirements when they get invited to submit their artifacts. That's after the paper has been accepted, and they have about a week to prepare the artifact, at the same time as they have to work on the camera-ready version. By that time, it's too late to ask for much.

Isn't that the process we should change? One submits the artifact with the paper, as is common in many other conferences.

@Bas Spitters That’s the process across all of SIGPLAN and beyond (https://artifact-eval.org/), so that artifact evaluation cannot yet affect paper evaluation (a pact with the devil, admittedly). But _that_ doesn’t prevent the AEC from picking the image in advance.

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 07:30):

@Gabriel Scherer In fact, “find out the requirements a week before” seems the opposite of “without any friction”.

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 07:50):

@Karl Palmskog Re trust, at ICFP people also provide source tarballs — but AFAICS the AEC is still peer review, so it’s not clear it’s required to check against outright intentional fraud like Coq subversion (I’d still expect make clean to work). Reproducible build scripts would be nice but quite some work for authors (I’ve done it), but I’m less sure how pre-built containers help unless you force the Coq version and set of libraries.

view this post on Zulip Karl Palmskog (Sep 12 2021 at 09:04):

this whole discussion is about what should be required, so whether or not reviewers currently check for fraud is not really all that interesting, I think. Double blindness imposes considerable burdens on authors, but was still judged by committees as justified impositions, even in "social processes".

view this post on Zulip Karl Palmskog (Sep 12 2021 at 09:11):

for example, I had a terrible time submitting an artifact to TACAS, since they imposed a specific ancient Debian-based VM image, and demanded that everything be delivered as a tarball that could be directly installed inside the image without network access. This strengthens reproducibility, but had very little flexibility for both authors and reviewers. The Coq platform is designed to be installable across Linux/Mac/Win, so it already permits a lot of flexibility (but not the extreme flexibility to use hacked versions of Coq/libraries that were common in the past)

view this post on Zulip Bas Spitters (Sep 12 2021 at 09:54):

@Paolo Giarrusso That's interesting, but the website does not seem to mention POPL, CPP, ITP, ICFP, ...
At least at CPP and ITP artifact evaluation is part of the refereeing process, IIRC.
@Andrej Bauer has written a nice blog post about this:
http://math.andrej.com/2013/08/19/how-to-review-formalized-mathematics/

It looks like standards have slipped...

view this post on Zulip Karl Palmskog (Sep 12 2021 at 09:55):

License the code so that anyone who wants to check the correctness of the proofs is free to do so.

This would already exclude a significant fraction in POPL and other top conferences...

view this post on Zulip Karl Palmskog (Sep 12 2021 at 10:12):

Provide information on how to independently verify the correctness of the code.

Based on above, I think a point of disagreement is that I don't think a reviewer using an author-provided VM or container (with characteristics chosen at their discretion) counts as independent verification.

view this post on Zulip Karl Palmskog (Sep 12 2021 at 10:19):

I also wonder specifically about POPL AEC power differential. At least when I was in the AEC, most reviewers were grad students or early career PhDs, and the submissions generally came from teams led by very senior researchers for accepted POPL papers, and it was all single blind. If it's all vague social processes and not a well-defined protocol, how free do reviewers feel to apply criteria like those described by Andrej [which to me appear reasonable]?

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 12:26):

@Bas Spitters I know that blog post, but “standards have slipped” assumes those were ever the standards — Andrej’s blog post is not a CFP. The artifact website was last updated in 2015, guess ICFP and POPL joined later. They also _allow_ submitting supplementary material, but reviewers are not required to examine it (https://popl21.sigplan.org/track/POPL-2021-research-papers#POPL-2021-Call-for-Papers).

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 12:47):

(deleted)

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 12:53):

@Karl Palmskog From when I sat in the AEC, reviewers are anonymous so there’s no concern about retaliation. But AEC chairs are more senior and IIUC they have more power to override reviewers than a PC chair — this _might_ have been explicit back then.

view this post on Zulip Tej Chajed (Sep 12 2021 at 13:38):

I went through and looked at every Coq artifact from POPL 2021. Here are some stats on how what they depend on:

count dependencies
2 no dependencies
3 no Coq dependencies
2 only platform deps (mathcomp-ssreflect, equations)
2 non-platform deps (Iris+autosubst; coq-ext-lib + coq-paco)
4 Iris

view this post on Zulip Tej Chajed (Sep 12 2021 at 13:40):

Only two papers supplied only a VM, simply because they didn't link to their repo. Otherwise papers gave at least sources or a VM + sources. Only one paper had Docker instructions, with a Dockerfile based on the docker-coq images.

view this post on Zulip Tej Chajed (Sep 12 2021 at 13:44):

The above was compiled from a spreadsheet where I tracked every paper individually. More anecdotally (because I didn't take notes on this), I believe compiling the proofs for every development was easy, assuming the authors got their dependencies right. What varies enormously between artifacts is how well the correspondence between the artifact and paper is documented.

view this post on Zulip Bas Spitters (Sep 12 2021 at 14:21):

@Paolo Giarrusso :-)
I think Andrej's blog sort-of reflects the standards at ITP, CPP. The standards at POPL are lower by design, but I'm not sure there's a good reason for that.

view this post on Zulip Paolo Giarrusso (Sep 12 2021 at 15:11):

@Bas Spitters I’m an outsider at ITP/CPP, but mechanizations are optional at POPL anyway, unlike ITP/CPP and like most of maths. If your paper passes review without an artifact, the artifact should strictly add to the process. From earlier discussions, AEC proponents would like to have stricter demands on artifacts in general (not just proofs), but there was strong oppositions (mostly, from those submitting research prototypes, not proofs), so this was the compromise.


Last updated: Jan 30 2023 at 11:03 UTC