I'm talking to some people at github. They have the GitHub Code Scanning tool which gives access to a high-level language for defining linting rules. It supports many languages. It could potentially support Coq, but one needs a proper Coq AST. For instance in a tree-sitter grammar. Currently, that does not seem available (but ocaml is). What's the easiest way to get hold of the Coq AST?
Relevant links:
Code-Scanning: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/about-code-scanning
Tree-sitter: https://tree-sitter.github.io/tree-sitter/
the full Coq AST is only known to Coq (notations, plugins, etc.) so I don't know how well this will work.
theoretically, I guess they could use the sexp representation of the AST from SerAPI (it will contain implementation-boilerplate wrapping of extension syntax)
Indeed Coq AST is extended by plugins, the first one to do that is LTAC
Once we complete the execution / parsing separation, we could imagine a linting tool that will be able to parse Coq files without excessive cost. But indeed, it will still be more complex than a static grammar.
You can parse Coq files today without excessive cost
Treesitter is GLR, so Coq is outside of its scope.
but still you don't have a closed AST that can be used by that tool
Emilio Jesús Gallego Arias said:
You can parse Coq files today without excessive cost
Coq does not expose an API to do it.
Coq actually does expose such an API at the ML level, the STM, that people often use by means of SerAPI
AFAIK, the STM doesn't know how to parse a file without full compilation of dependencies. The only thing it can do is to skip proof checking.
With the right settings the parsing is pretty quick as you can skip proofs, I'd bet that further gains would be minimal over the set of files, let's say, in CI
Is your claim that typechecking (without proofs) is negligible compared to parsing?
My claim is that today you can already parse fast enough for lint use
That's not the problem linters face, speed
How do you call that X/Y problem?
You are trying to solve a problem that linters don't have
They face way more serious issues when trying to parse Coq than speed which is acceptable
if you go from 1 sec to 0.1 second they don't care
So my claim is that "we can parse Coq files without excessive cost" already
Of course that depends how you do define excessive cost
What kind of linters are you talking about? Do you have any examples or references?
For the linter mentioned by @Bas Spitters , making parser marginally faster solves nothing. And yes I claim that in CI most of the time is due to proof search / Qed, in fact if not for other important parsing and system bugs that prevent more advanced tooling from working, I'd have the exact number at hand (proof building / checking time vs total check time)
I really like the openness of this discussion
Maxime Dénès said:
I really like the openness of this discussion
What does that mean?
I can at least say something about speed of parsing the AST using SerAPI. It's indeed very quick to produce tokens and/or sexps for Coq sentences, so quick you can do it "online", on demand
you have to do it in dependency order to get the correct parsing state, but that's the price to pay for an extensible AST...
How does it work on UniMath, for example? Does it skip more than the STM does?
we didn't try on UniMath unfortunately, the main object of study was the MathComp ecosystem.
Maybe the STM is good enough for interactive use (for sources prepared correctly and that avoid obligations!), but 10x cost on the cloud is different.
About billing for code scanning
Code scanning uses GitHub Actions, and each run of a code scanning workflow consumes minutes for GitHub Actions. For more information, see "About billing for GitHub Actions."
At that point, can you instead connect linters with the normal build process?
as of today, I think doing linting as part of the normal build process is the way to go
unfortunately this would probably preclude using all these fancy general syntax tools like GitHub's (they don't want to integrate Coq builds into their tools)
(in any case, based on some of our experiments, Coq linting should likely be based partly on the proof state, e.g., indenting doesn't work well for a stupid formatter that only knows about tokens)
Last but not least, the STM is being deprecated since it is unmaintainable. As soon as the alternative is ready we will move it away. No matter if the alternative is only marginally faster.
what is the alternative? Is it described somewhere?
Karl Palmskog: (in any case, based on some of our experiments, Coq linting should likely be based partly on the proof state, e.g., indenting doesn't work well for a stupid formatter that only knows about tokens)
I agree that can't work for the present math-comp style, but even math-comp today might pick something else... for instance, with Set Default Goal Selector "!"
(and bullets or braces), I think token-based indentation works well — normal coqc compilation enforces that bullets match the proof state, and indentation can be handled by a token/AST-based linter/autoformatter
Indeed as is done in Rust, linter should be able to ask for any info they may need, maybe the AST, and maybe the goals, and indeed Coq should be able to compute that fast.
For goals we did quite a bit of work so now it is possible to know which commands will operate under goals, for parsing for example we didn't refine commands yet in a similar way, so approximations are needed.
You can have best of all words classifying what state is updated; so no need to enforce a phase split (which Rust devs know well is super-hard and in the end futile in most languages), but to make most commands reflect what they do in the state.
splitting _name resolution_ can be hard — but I haven't seen _Coq_ do the typical things that make it hard — but splitting parsing is much more common than incremental typechecking
sorry: that's what it _seems_ to me at least; probably the Eclipse Java Compiler and other IDE tooling counts as exceptions, but they seem hard to make sound
Paolo Giarrusso said:
splitting _name resolution_ can be hard — but I haven't seen _Coq_ do the typical things that make it hard — but splitting parsing is much more common than incremental typechecking
Yes, name resolution is harder (for Coq), unfortunately. There was a recent discussion where @Jason Gross (IIRC) reminded us of ways name resolution can depend on typing.
if that's there to stay, then it'd be great if x.(foo)
disambiguated foo
using x
's type :-)
Actually I believe that phase separation in Java is not fully possible due to some lazy class loading, but I don't know the details, indeed that seemed to piss off a lot of people; in the end tools don't care so much, they just want the pieces of data they need computed fast I think
IIRC separating parsing is fine and you don't even need C's lexer hack, but separating name resolution and typechecking is beyond hopeless
Basically OO dot notation x.field
(even in C) is enough to entangle name lookup for field
with typechecking of x
. And even Coq clearly has this for _module_ typechecking; I still suspect you _can_ "shape-check" modules for name resolution before core typechecking, but that's much less trivial than I was assuming above.
Program Instance foo : bla := _. Check foo.
the second foo
refers to the instance only if TC resolution for bla
succeeds, otherwise it refers to bla
imported from another module
yes but that's only backward compatibility — I recall Maxime has fixed various such issues, and this one would go too if you could require Program Complete foo.
maybe that'll never happen, but "backward compatibility" seems a different bucket from intrinsic challenges like the other we were discussing
@Emilio Jesús Gallego Arias re "lazy class loading", files can be mutually recursive and have no separate .mli
-like interfaces, so you can't typecheck or do name-resolution in each file in isolation, but splitting parsing seems still easy because parsing is not extensible — even Scala has separate parsing IIRC
Last updated: Oct 13 2024 at 01:02 UTC