Suppose more than one proof engineers are working on proving stuff about a product. Anyone with experience in this kind of scenario have wisdom they'd like to share about code quality? I searched zulip history and found a comment that the anti-pattern guide is the closest thing there exists to a style guide
I wouldn't call that a guide, it's just a random wiki page
i think it's a start-- it would form a citation in a business' internal style guide
Preach. I've seen https://gitlab.mpi-sws.org/iris/iris/-/wikis/style-guide, and math-comp also has a document somewhere, but this isn't a trivial problem. Neither guide is very complete.
while you’re at it, you might also wonder about tool support. I suspect VsCoq and ProofGeneral have different indentation rules; they certainly have different commenting styles.
the Coq tracker has an issue about automatic formatters.
Even comment layout is tricky. I used to use Javadoc-style comments:
(** Coqdoc comment: short summary.
* More details *)
but if you actually run Coqdoc, that’ll turn More details
into a title :-(
math-comp recommends instead
(* Line1 *)
(* Line2 *)
I guess it could work, but VsCoq doesn’t help set that up, and I haven’t seen much code outside math-comp with a consistent comment layout.
i will check coqdoc interaction before I commit my team to a commenting style, thanks!
@Quinn Thanks! And should you come up with something, feel encouraged to share. Hopefully I can do the same, if we find a solution.
give my team a good half year and we'll have some experience to back up whatever style guide I end up writing, or some other team publishes a style guide and we rally behind them. A cool vision would be if several teams consolidated around one style guide, then emacs/vscode plugins could be written based on them.
the most detailed "style guide" around is for MathComp: https://github.com/math-comp/math-comp/blob/master/CONTRIBUTING.md
there is also a guide which has some style recommendations but focuses on high assurance requirements: https://www.ssi.gouv.fr/uploads/2014/11/anssi-requirements-on-the-use-of-coq-in-the-context-of-common-criteria-evaluations-v1.0-en.pdf
there will likely be a proliferation of different incompatible styles in the Coq community as long as Coq itself is around...
Sure, and most code formatters are configurable. But thankfully, configurations don’t make them surjective!
It would be a great project to, at least, consolidate all the available style guide / code formatting knowledge into a single document, which would show the different options for the different elements and which team adhere to which.
That's the kind of thing that could be hosted in coq-community for instance (though the Coq wiki is also fine if a wiki page is suited for that).
we linked to a bunch of different style guides in the intro section of our IJCAR 20 paper, and then there are some refs in our Coq Workshop '20 abstract
the issue of styling is also highly related to documentation generation, which leads to Alectryon, SerAPI, coqdoc, dependency graphs using coq-dpdgraph and coqdep, and the like
one early style guide I co-authored: https://github.com/uwplse/verdi/blob/master/STYLE.md
ok this is great, bookmarking this thread, thanks!
Here's another one that's probably already on the list:
https://github.com/HoTT/HoTT/blob/master/STYLE.md
@Karl Palmskog I think I'm gonna use verdi/PROOF_ENGINEERING.md
in my team. Thanks!
@Bas Spitters that's an informative and educational document! Using typeclasses to encode axioms is a nice pattern.
no unnecessary type declarations for quantified variables
@Karl Palmskog i'm not 100% convinced I find this desirable. I'll be tempted to enforce the opposite.
what are some of the advantages you see?
@Quinn Many people like explicit annotation for function arguments. Do you want to repeat those declarations in 10 lemmas about a function living next to it? 20? How much is too much?
Once I have gotten pushback, but the compromise was to predeclare argument types Implicit Types ...
Main problem: types take a lot of space which may hinder readability. I think explicit types in lemmas can appear in documentation instead, for example using Alectryon.
Maybe "no unnecessary" cane be understood in a too strong way, eg. Coq can infer them reliably, which implies you/your-readers know well enough how type inference works. But for sure putting types everywhere is too verbose to be practical and most of the times it really does not help. Eg take forall x xs, mem x xs -> ....
, why would typing both x and xs help the reader? It would only help if the names were chosen badly, as in forall x y, mem x y -> ...
(which is which?).
Some times I like having explicit type annotations even when Coq can infer them, mostly when dealing with many types that have coercions between them and shared notations that depend on which scope is open. It can get confusing fast, specially for someone not familiar with the code.
This is interesting to consider!
I would say the key goal when deciding if a type annotation should be there or not is to make the code more readable to a medium advanced user. IMHO some authors either optimize it for writability (= leave them away wherever you can) or for readability of readers at their own level of expertise (with the specific piece of code, which likely few people have). I find it helps to ask others from time to time what their opinion is on ones code style.
There is also the linter for lean:
https://arxiv.org/pdf/2004.03673.pdf
if hypothetically i wanted to write lint for coq
, could i use serapi
in python
?
@Quinn if you want to write a mechanical (rule-based) linter using Python, either SerAPI or PyCoq is probably the way to go. Our experimental machine learning based formatter specifically used output from the sertok
and sercomp
tools provided by the coq-serapi
opam package.
I believe there is some Python based tooling for dealing with output from sertok
and sercomp
here, if that's the route you want to go:
in any case, we strongly advise against trying to parse Coq code without using Coq's parser at the OCaml level or using something like SerAPI or PyCoq. Because both the Coq parser and lexer are extensible, handling all code correctly in an external parser is near-impossible (one can't even assume a sentence ends by .
due to notations)
Oh yeah I'd never in a million years attempt it on my own, haha. I had a glance at the pycoq
repo and that might have what I'd want to make a linter
If you wanna use python, pycoq is the way to go; we did add a lot of updates so ping me @Quinn if you start using it, as of now, what's on github is very preliminary, but we can push an OPAM release soon
outstanding, thanks!
(realistically it might not end up being a priority because my boss isn't even 100% sure we're hiring other proof engineers besides me, and a linter on a team of one doesn't have a high impact) (I'd love to dive into it, but it might be something I can't bill)
(i mean most probabiltiy mass is it being something i can't bill, which i'm fine with!)
@Quinn FWIW we'd also be interested, and I'm sure more people would, even if I can't make any promises...
And while we're talking of tooling, stuff in https://github.com/JasonGross/coq-tools might be helpful!
(for instance minimize-requires!)
@Paolo Giarrusso in a linter? yeah i'd make sure it's in a public-facing repo (even if I have to host it from my employer's github org account) and do what i can to make it generally applicable to teams outside of my own.
Any starting point is probably great... Different users probably have different ideas on the actual policies, but honestly for an MVP, I guess plain Python is a perfectly reasonable policy language... I guess as long as different rules are enforced "separately" or "modularly"?
mvp: my styleguide hardcoded into an executable that exit code 1
s if there's at least one warning and exit code 0
s otherwise
v2: ship with various teams' styleguides as cli options
v3: yaml config
v4: emacs mode
v5: formatting cli such that formatter input.v; linter input.v
raises zero warnings
what do you mean separately/modularly?
sorry that was probably premature, I don’t mean anything too surprising
just that having different warnings in different… modules/functions (or w/ever) might be easier to manage/configure/…
but it’s not my place to say!
yeah without having looked at serapi python port or pycoq yet, my guess is i'd want 1 def
per rule and a config amounts to a list[callable]
, where each callable would be of signature whateverPyCoqClassRepresentsADotVFile -> Bool
, though rules could be parametric and a functools.partial
could be used to make different flavors of the same rule.
some flavor of all(map(lambda f: f(dot_v_file_representation), config)
that's aware of warnings and aggregates them. Cuz I guess the return type wouldn't be bool
because we'd want to tag False
with a str
. Oh yeah, it's not strictly fp so I can just throw a print
before returning the bool
, lmao
I was also thinking of option string
ah, indeed. or list[str]
where ""
instead of None
: falsey corresponds to successful rule, truthy/a message corresponds to a failed rule
(realistically my timeline is 0. finish a major non-proofengineering deadline on the 15th, 1. finish a PoC to confirm that my team is actually using coq and not agda, 2. research upthread styleguides to write my team's styleguide, then maybe 3. write linter)
@Quinn I might have some availability to work with you on this. I'm currently putting together an effort to organize a Coq style guide. A linter would be a powerful tool. My December is somewhat busy but I'd be excited to talk regardless
Hi! Thanks for reaching out. All I've done is assemble notes from this thread into a directory. My team isnt even 100% sure we're using coq yet still so it's up in the air whether I'll really be in the soace
Sounds good. I've been collecting resources in a repo: https://github.com/appliedfm/vstyle
The guide is currently light on content. I've been using the issue tracker to take notes about existing resources, tooling support, etc.
For tooling, I've played around a bit with serapi and think it's good enough for now. Some stuff is missing/incomplete (comments, in particular) but I think there's enough functionality to get things started. The s-expression representation contains file offset information; in theory a code formatter could use this to perform all kinds of surgery on files. Not my favorite long-term approach but it could certainly be done.
Another possibility would be to introduce a configurable pretty printer into Coq. But I have not investigated that at all.
FWIW, there's already a "beautifier" in Coq that originates from the V7 -> V8 syntax translation tool and is quite broken at the moment. This could be a basis for writing an auto-formatter in Coq itself, but I feel like the external approach based on SerAPI is more likely to be successful.
I still believe a machine learning-based formatter aided by manual so-called reranking rules is the way to go (https://arxiv.org/abs/2006.16743). For example, here is a manually formatted file and what our experimental tool predicted from a raw Coq token stream:
notably, we didn't handle bullets all that well in a naive predictor, since they are sentences by themselves. If one comes up with a nice small abstraction on top of Vernacular (e.g., special dummy tokens for bullets), this could be handled quite easily though. Same for Qed.
and Proof.
.
I'm not sure people are aware, but the Coq Platform is basically a huge machine learning dataset (1.7+ million lines of Coq code guaranteed to work for a specific Coq version).
what a general "Coq style project" could provide is a manually curated (Python?) API/AST for Coq tokens/documents, based on the actual Coq AST/API as exposed by SerAPI and/or PyCoq. Once you have that, people could implement their own formatters for this API/AST, some rule-based, some learning-based. See also for Lean: https://link.springer.com/chapter/10.1007%2F978-3-030-53518-6_16
for example, the SerAPI representation of the bullet -
is something like: Sentence(Bullet("-"))
, while you probably don't want bullets to be considered standalone sentences for formatting (e.g., they could belong to the next sentence in the document)
I also think Coq-Elpi already has abstractions for some syntax-level Coq constructs?
here is an early attempt at a mechanical formatter that's been mentioned in other similar topics: http://www.eelis.net/coqindent/
we link to 3-4 style guides for Coq (and Lean) here, some have been mentioned before: https://arxiv.org/pdf/2004.07761.pdf#page=2
Thank you both for your thoughtful comments :heart:️ I'm going to put some experiments together with serapi (and follow up on these links!)
@Tim Carstens see also the sertok
CLI tool that Emilio and I developed, which is installed when you install coq-serapi
. This tool does not actually use SerAPI-the-API, only the Serlib library for Coq and the OCaml interface to Coq: https://github.com/ejgallego/coq-serapi/blob/v8.13/sertop/sertok.ml -- if you run it on some Coq code it shows a bit of how Coq's extensible lexer works, and how token locations are stored.
I was playing with it (sertok) last night - I'm completely convinced that it would work well in this use case
OK, in this case I should flag up that it should be possible in principle now to hack SerAPI to have sertok provide JSON output. When we first did sertok, sexps were the only viable alternative, but I think Serlib has (nearly?) full JSON support now.
in any case, the sertok
output will be subtly different for each Coq version (since you're looking at Coq internals), which is why one might want to abstract
Fortunately these days we have github actions & a beautiful coq platform, both of which make extensive regression testing possible <3
also possibly relevant, we have a dataset from before the Coq platform (Coq 8.10) which for a bunch of projects has .v
files, token sexp files, and AST sexp files, e.g., https://github.com/EngineeringSoftware/math-comp-corpus/tree/master/raw-files/math-comp_analysis
we tested them for the serialization-deserialization roundtrip, which is probably what one would need to set up for a new tool using the platform
Dropping a quick update:
I’ve started writing a formatter/linter. I’m writing it as a command line tool in OCaml to keep dependencies simple. (If the tool becomes useful, a later effort could integrate it with vscode/emacs/etc.)
I’m new to OCaml but things seem to be going smoothly. I referred to pycoq for an example of how to use serapi as a library.
Thank you again everyone for your input, I’m looking forward to dropping a link to the repo Real Soon Now 👍
repo link: https://github.com/appliedfm/vstyle-tools
Currently based on Coq 8.14. The logic is based on sertok. I've run into two gaps:
Example output shown in https://github.com/appliedfm/vstyle-tools/pull/1
Still, this is an encouraging start :blush:
Edit: I revisited the code and realized I could be pretty-printing the Vernacexpr.vernac_control_r CAst.t
that I get from calling Stm.parse_sentence
. This doesn't help with the code comments, but it does give a structured AST. This means I'm unblocked :+1:
Anyone wants to answer this question with the state of style guides in Coq? https://proofassistants.stackexchange.com/questions/984/naming-conventions-letter-case-underscores-c-for-coq
I would also like to know!
@Cyril Cohen wrote a linter for mathcomp that never got merged: https://github.com/math-comp/math-comp/pull/163
Maybe it could be published somewhere as a standalone?
This thread seemed to have stalled. Is anyone still thinking about these issues?
A concrete use case could be to have something for e.g. Software Foundations.
One more example of an old draft of a style guide: https://github.com/coq/coq/wiki/CoqStyle
Note that this style also is missing bullets which I would say the style of the Coq stdlib suggests.
Mathcomp have their own style guide, so any document should point to a range of styles rather than impose a singular one.
This old pr is not completely orthogonal to this discussion. With the dev hat on: we will be soon able to write tools which parse coq files (using the coq parser) without executing a single line (CC @Maxime Dénès ). Writing a proper linter (I mean, not based on sed/awk) will then be possible.
Last updated: Sep 26 2023 at 12:02 UTC