Actually not sure the current platform is suited yet to build research artifacts on top, but that's a different discussion
the current "standard" for Coq artifacts is that it's an ad-hoc mess of tarballs, containers, scripting, VM images, ... seems to me that the platform is a huge improvement on that, even though it may not have reached full industrial maturity yet
one example I remember is when some submission provided a VM image with all .vo
files compiled and no obvious way to recheck or run coqchk. I guess they expected me to traverse the vo files manually and certify the proofs that way
My experience from the POPL PC is that the quality of Coq artifacts can be improved. @Karl Palmskog are you in contact with the POPL artifact evaluation committee? They should be able to give you some data.
@Bas Spitters I haven't been hooked into the POPL community for a while, no. The ideal situation would be that they mandate that if you use Coq 8.13 or 8.14, you have to be compatible with the latest Coq platform for those versions (this is probably the minimum for me to want to review such an artifact these days). But I'm not really interested in coming in as an outsider and trying to push for that.
We ask the input of some regulars here, people like @Alix Trieu @Gabriel Scherer @Tej Chajed
https://popl21.sigplan.org/committee/POPL-2021-artifact-evaluation-artifact-evaluation-committee
there is also a platform issue where one should probably summarize any conclusions: https://github.com/coq/platform/issues/2
for example, Tej reported there before:
[I]n POPL 2020, 17 out of the 40 artifacts used Coq.
I gathered statistics on mechanization for POPL 2020 artifacts by making a spreadsheet of all the artifacts (which included if they were mechanized and in which proof assistant if so), that's where the 17 out of 40 number comes from. I was planning on re-doing that study for POPL 2021 (and POPL 2022 when artifacts are submitted). Is the followup question then out of the Coq artifacts, how were they packaged?
@Tej Chajed Yes, how were they packaged, and would coq platform have improved the packaging ...
I see I was off by a year :face_palm: , but that you are in both committees...
https://popl22.sigplan.org/track/POPL-2022-artifact-evaluation
by the way, to me at least, using a custom container or VM image is a trust issue. I don't know how if it has been fiddled with or not, and it's a huge hassle to get reassurance of the latter. Using approved pre-built containers would in itself be a step forward.
I don't fully understand how Coq platform packaging would look. Don't you still need a Makefile, presumably wrapping coq_makefile
? And if you have dependencies outside the platform, maybe you would have opam dependencies (ideally with fixed version numbers for reproducibility)?
(I'm not currently using the Coq platform so this is just my ignorance)
the final distribution/packing of the artifact could be done in many ways, could use coq_makefile or Dune. I think the point is that I should be able to spin up an Ubuntu Snap or a platform Docker image or Win installer and it would work just running some top-level make
or dune
or even opam install
I see, that makes sense. Currently the recommendation is to submit a VM for all artifacts, but in practice not all artifacts even submit that.
If we want people to go a step further and support Docker I think the instructions need to be a lot more explicit so it's easy for both authors and reviewers. The Proof Artifacts guidelines (https://proofartifacts.github.io/guidelines/) are probably a good place to add some technical details on using containers.
as per above, when I was in the POPL AEC, I felt I had the least confidence when people gave a full VM. If I could reproduce the artifact in my own opam-based environment, then almost all environment manipulation concerns are gone.
one idea for (platform-based) containers could be to follow GitHub's Action approach, e.g., the artifact contents are put in a known directory inside the pre-designated third-party container (checkout
action), then a given command is run
I like the idea of using a standard Docker image. It's nice to not have the artifact clutter your machine's setup (or worry that it might do so), while also being much more transparent and readable as to what environment you're using.
If people would just use the docker Coq images in CI that would already help a lot; it's not literally a list of commands I can run locally but it's a form of checked documentation that inspires confidence. Certainly if you're already doing that getting to Docker instructions that work should be easy.
I was invited to give some input, so here it is. I think it's important to keep in mind that Artifact Evaluation is a social process, whose aims seems to be trying to gently lift our community to gradually better artifacts without any friction. (This is reverse-engineered from my experience as an evaluator and chair.) In this environment, "perfect is the enemy of good". If your requirements are too strict, people will be unhappy and give up (and some will complain loudly), and ultimately the process will tolerate them and accept lowering the norms.
Authors typically find out about the Artifact Evaluation (AE) requirements when they get invited to submit their artifacts. That's after the paper has been accepted, and they have about a week to prepare the artifact, at the same time as they have to work on the camera-ready version. By that time, it's too late to ask for much. Even asking for a specific version of Coq may be difficult -- considering the effort to port a sizeable development.
the fewer requirements you have, the more "friction" the artifact evaluators will face. Clearly, there is something close to a zero-sum game (and I say that as having both submitted and reviewed artifacts for several years)
My recommendation would be that the Coq community writes clear packaging recommendations -- including, possibly, specific guidelines for research artifacts -- and try to advertise them broadly within academia outside the AE process (for example, discuss them with the important groups that are pushing out Coq developments). Then AE people will be able to link these recommendations in their documents, but they would probably not be taken as mandatory rules right away. Improving practices takes time.
(Another thing to keep in mind is that some of the evaluators will only be barely familiar with Coq, and may not have an up-to-date Coq environment; maybe those enjoy VM images more than source distributions -- I certainly agree that it's important to have clean source packaging, of course.)
I am in agreement with Gabriel. Currently I think authors don't quite know what process to follow, so then they submit confusing and inconsistent artifacts, and then reviewers struggle with them but eventually figure it out. A good first step would be some standard practices for packaging Coq developments in general, with the hope that when it comes time to submit an artifact authors have no extra work to do.
Regarding specifically the question of using the platform for research artifacts: I would try to check whether the larger existing (and actively researched) Coq developments around are Platform-ready. I would hazard the guess that the Iris stuff is rather up-to-date with current best practices, but what about other groups such as MIT, the CertiCoq stuff at Princeton, etc.? (The larger the development, the harder to port to new packaging approaches, and the more advance warning they need.)
the chance for reaching any consensus on Coq packaging in general anytime soon are very close to zero.
I don't think there needs to be one way to do it, but basically if you have dependencies they should be easy to install, and we have easy-to-find instructions on how to use either opam or Docker to get a particular version of Coq
for example I don't really know how to use Docker, and I don't really understand opam local switches, but if I knew where to find instructions on these and authors started using opam or dune or the Coq platform, then dependency management would become a lot easier
Gabriel Scherer said:
Authors typically find out about the Artifact Evaluation (AE) requirements when they get invited to submit their artifacts. That's after the paper has been accepted, and they have about a week to prepare the artifact, at the same time as they have to work on the camera-ready version. By that time, it's too late to ask for much.
Isn't that the process we should change? One submits the artifact with the paper, as is common in many other conferences.
I think 2021.09 will improve things, because it makes it easy to add custom package list files and still get things maintained with an older Coq version. An older Coq platform might not run any more cause of changes in OS (that's the reason for 2021.02.2). The idea is that the current Coq platform also supports building unchanged older package lists on newer OSes. Then it should be trivial to make a modified package list for some research artifacts. If there are specific applications I would be happy to demonstrate it.
Bas Spitters said:
Gabriel Scherer said:
Authors typically find out about the Artifact Evaluation (AE) requirements when they get invited to submit their artifacts. That's after the paper has been accepted, and they have about a week to prepare the artifact, at the same time as they have to work on the camera-ready version. By that time, it's too late to ask for much.
Isn't that the process we should change? One submits the artifact with the paper, as is common in many other conferences.
@Bas Spitters That’s the process across all of SIGPLAN and beyond (https://artifact-eval.org/), so that artifact evaluation cannot yet affect paper evaluation (a pact with the devil, admittedly). But _that_ doesn’t prevent the AEC from picking the image in advance.
@Gabriel Scherer In fact, “find out the requirements a week before” seems the opposite of “without any friction”.
@Karl Palmskog Re trust, at ICFP people also provide source tarballs — but AFAICS the AEC is still peer review, so it’s not clear it’s required to check against outright intentional fraud like Coq subversion (I’d still expect make clean
to work). Reproducible build scripts would be nice but quite some work for authors (I’ve done it), but I’m less sure how pre-built containers help unless you force the Coq version and set of libraries.
this whole discussion is about what should be required, so whether or not reviewers currently check for fraud is not really all that interesting, I think. Double blindness imposes considerable burdens on authors, but was still judged by committees as justified impositions, even in "social processes".
for example, I had a terrible time submitting an artifact to TACAS, since they imposed a specific ancient Debian-based VM image, and demanded that everything be delivered as a tarball that could be directly installed inside the image without network access. This strengthens reproducibility, but had very little flexibility for both authors and reviewers. The Coq platform is designed to be installable across Linux/Mac/Win, so it already permits a lot of flexibility (but not the extreme flexibility to use hacked versions of Coq/libraries that were common in the past)
@Paolo Giarrusso That's interesting, but the website does not seem to mention POPL, CPP, ITP, ICFP, ...
At least at CPP and ITP artifact evaluation is part of the refereeing process, IIRC.
@Andrej Bauer has written a nice blog post about this:
http://math.andrej.com/2013/08/19/how-to-review-formalized-mathematics/
It looks like standards have slipped...
License the code so that anyone who wants to check the correctness of the proofs is free to do so.
This would already exclude a significant fraction in POPL and other top conferences...
Provide information on how to independently verify the correctness of the code.
Based on above, I think a point of disagreement is that I don't think a reviewer using an author-provided VM or container (with characteristics chosen at their discretion) counts as independent verification.
I also wonder specifically about POPL AEC power differential. At least when I was in the AEC, most reviewers were grad students or early career PhDs, and the submissions generally came from teams led by very senior researchers for accepted POPL papers, and it was all single blind. If it's all vague social processes and not a well-defined protocol, how free do reviewers feel to apply criteria like those described by Andrej [which to me appear reasonable]?
@Bas Spitters I know that blog post, but “standards have slipped” assumes those were ever the standards — Andrej’s blog post is not a CFP. The artifact website was last updated in 2015, guess ICFP and POPL joined later. They also _allow_ submitting supplementary material, but reviewers are not required to examine it (https://popl21.sigplan.org/track/POPL-2021-research-papers#POPL-2021-Call-for-Papers).
(deleted)
@Karl Palmskog From when I sat in the AEC, reviewers are anonymous so there’s no concern about retaliation. But AEC chairs are more senior and IIUC they have more power to override reviewers than a PC chair — this _might_ have been explicit back then.
I went through and looked at every Coq artifact from POPL 2021. Here are some stats on how what they depend on:
count | dependencies |
---|---|
2 | no dependencies |
3 | no Coq dependencies |
2 | only platform deps (mathcomp-ssreflect, equations) |
2 | non-platform deps (Iris+autosubst; coq-ext-lib + coq-paco) |
4 | Iris |
Only two papers supplied only a VM, simply because they didn't link to their repo. Otherwise papers gave at least sources or a VM + sources. Only one paper had Docker instructions, with a Dockerfile based on the docker-coq images.
The above was compiled from a spreadsheet where I tracked every paper individually. More anecdotally (because I didn't take notes on this), I believe compiling the proofs for every development was easy, assuming the authors got their dependencies right. What varies enormously between artifacts is how well the correspondence between the artifact and paper is documented.
@Paolo Giarrusso :-)
I think Andrej's blog sort-of reflects the standards at ITP, CPP. The standards at POPL are lower by design, but I'm not sure there's a good reason for that.
@Bas Spitters I’m an outsider at ITP/CPP, but mechanizations are optional at POPL anyway, unlike ITP/CPP and like most of maths. If your paper passes review without an artifact, the artifact should strictly add to the process. From earlier discussions, AEC proponents would like to have stricter demands on artifacts in general (not just proofs), but there was strong oppositions (mostly, from those submitting research prototypes, not proofs), so this was the compromise.
Last updated: Jun 03 2023 at 05:01 UTC