Stream: Coq devs & plugin devs

Topic: Github code scanning


view this post on Zulip Bas Spitters (Jan 09 2023 at 15:07):

I'm talking to some people at github. They have the GitHub Code Scanning tool which gives access to a high-level language for defining linting rules. It supports many languages. It could potentially support Coq, but one needs a proper Coq AST. For instance in a tree-sitter grammar. Currently, that does not seem available (but ocaml is). What's the easiest way to get hold of the Coq AST?

Relevant links:
Code-Scanning: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/about-code-scanning
Tree-sitter: https://tree-sitter.github.io/tree-sitter/

view this post on Zulip Karl Palmskog (Jan 09 2023 at 15:14):

the full Coq AST is only known to Coq (notations, plugins, etc.) so I don't know how well this will work.

view this post on Zulip Karl Palmskog (Jan 09 2023 at 15:17):

theoretically, I guess they could use the sexp representation of the AST from SerAPI (it will contain implementation-boilerplate wrapping of extension syntax)

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:19):

Indeed Coq AST is extended by plugins, the first one to do that is LTAC

view this post on Zulip Maxime Dénès (Jan 09 2023 at 15:22):

Once we complete the execution / parsing separation, we could imagine a linting tool that will be able to parse Coq files without excessive cost. But indeed, it will still be more complex than a static grammar.

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:23):

You can parse Coq files today without excessive cost

view this post on Zulip Guillaume Melquiond (Jan 09 2023 at 15:23):

Treesitter is GLR, so Coq is outside of its scope.

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:23):

but still you don't have a closed AST that can be used by that tool

view this post on Zulip Maxime Dénès (Jan 09 2023 at 15:25):

Emilio Jesús Gallego Arias said:

You can parse Coq files today without excessive cost

Coq does not expose an API to do it.

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:26):

Coq actually does expose such an API at the ML level, the STM, that people often use by means of SerAPI

view this post on Zulip Maxime Dénès (Jan 09 2023 at 15:28):

AFAIK, the STM doesn't know how to parse a file without full compilation of dependencies. The only thing it can do is to skip proof checking.

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:28):

With the right settings the parsing is pretty quick as you can skip proofs, I'd bet that further gains would be minimal over the set of files, let's say, in CI

view this post on Zulip Maxime Dénès (Jan 09 2023 at 15:28):

Is your claim that typechecking (without proofs) is negligible compared to parsing?

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:30):

My claim is that today you can already parse fast enough for lint use

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:30):

That's not the problem linters face, speed

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:30):

How do you call that X/Y problem?

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:30):

You are trying to solve a problem that linters don't have

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:31):

They face way more serious issues when trying to parse Coq than speed which is acceptable

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:31):

if you go from 1 sec to 0.1 second they don't care

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:31):

So my claim is that "we can parse Coq files without excessive cost" already

Of course that depends how you do define excessive cost

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:32):

What kind of linters are you talking about? Do you have any examples or references?

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:33):

For the linter mentioned by @Bas Spitters , making parser marginally faster solves nothing. And yes I claim that in CI most of the time is due to proof search / Qed, in fact if not for other important parsing and system bugs that prevent more advanced tooling from working, I'd have the exact number at hand (proof building / checking time vs total check time)

view this post on Zulip Maxime Dénès (Jan 09 2023 at 15:35):

I really like the openness of this discussion

view this post on Zulip Emilio Jesús Gallego Arias (Jan 09 2023 at 15:37):

Maxime Dénès said:

I really like the openness of this discussion

What does that mean?

view this post on Zulip Karl Palmskog (Jan 09 2023 at 16:18):

I can at least say something about speed of parsing the AST using SerAPI. It's indeed very quick to produce tokens and/or sexps for Coq sentences, so quick you can do it "online", on demand

view this post on Zulip Karl Palmskog (Jan 09 2023 at 16:20):

you have to do it in dependency order to get the correct parsing state, but that's the price to pay for an extensible AST...

view this post on Zulip Maxime Dénès (Jan 09 2023 at 16:22):

How does it work on UniMath, for example? Does it skip more than the STM does?

view this post on Zulip Karl Palmskog (Jan 09 2023 at 16:25):

we didn't try on UniMath unfortunately, the main object of study was the MathComp ecosystem.

view this post on Zulip Paolo Giarrusso (Jan 10 2023 at 00:44):

Maybe the STM is good enough for interactive use (for sources prepared correctly and that avoid obligations!), but 10x cost on the cloud is different.

About billing for code scanning

Code scanning uses GitHub Actions, and each run of a code scanning workflow consumes minutes for GitHub Actions. For more information, see "About billing for GitHub Actions."

At that point, can you instead connect linters with the normal build process?

view this post on Zulip Karl Palmskog (Jan 10 2023 at 09:26):

as of today, I think doing linting as part of the normal build process is the way to go

view this post on Zulip Karl Palmskog (Jan 10 2023 at 09:26):

unfortunately this would probably preclude using all these fancy general syntax tools like GitHub's (they don't want to integrate Coq builds into their tools)

view this post on Zulip Karl Palmskog (Jan 10 2023 at 09:37):

(in any case, based on some of our experiments, Coq linting should likely be based partly on the proof state, e.g., indenting doesn't work well for a stupid formatter that only knows about tokens)

view this post on Zulip Enrico Tassi (Jan 10 2023 at 12:11):

Last but not least, the STM is being deprecated since it is unmaintainable. As soon as the alternative is ready we will move it away. No matter if the alternative is only marginally faster.

view this post on Zulip Karl Palmskog (Jan 10 2023 at 12:12):

what is the alternative? Is it described somewhere?

view this post on Zulip Paolo Giarrusso (Jan 10 2023 at 21:42):

Karl Palmskog: (in any case, based on some of our experiments, Coq linting should likely be based partly on the proof state, e.g., indenting doesn't work well for a stupid formatter that only knows about tokens)

I agree that can't work for the present math-comp style, but even math-comp today might pick something else... for instance, with Set Default Goal Selector "!" (and bullets or braces), I think token-based indentation works well — normal coqc compilation enforces that bullets match the proof state, and indentation can be handled by a token/AST-based linter/autoformatter

view this post on Zulip Emilio Jesús Gallego Arias (Jan 10 2023 at 22:10):

Indeed as is done in Rust, linter should be able to ask for any info they may need, maybe the AST, and maybe the goals, and indeed Coq should be able to compute that fast.

For goals we did quite a bit of work so now it is possible to know which commands will operate under goals, for parsing for example we didn't refine commands yet in a similar way, so approximations are needed.

view this post on Zulip Emilio Jesús Gallego Arias (Jan 10 2023 at 22:11):

You can have best of all words classifying what state is updated; so no need to enforce a phase split (which Rust devs know well is super-hard and in the end futile in most languages), but to make most commands reflect what they do in the state.

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 04:01):

splitting _name resolution_ can be hard — but I haven't seen _Coq_ do the typical things that make it hard — but splitting parsing is much more common than incremental typechecking

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 04:09):

sorry: that's what it _seems_ to me at least; probably the Eclipse Java Compiler and other IDE tooling counts as exceptions, but they seem hard to make sound

view this post on Zulip Maxime Dénès (Jan 11 2023 at 10:33):

Paolo Giarrusso said:

splitting _name resolution_ can be hard — but I haven't seen _Coq_ do the typical things that make it hard — but splitting parsing is much more common than incremental typechecking

Yes, name resolution is harder (for Coq), unfortunately. There was a recent discussion where @Jason Gross (IIRC) reminded us of ways name resolution can depend on typing.

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 11:06):

if that's there to stay, then it'd be great if x.(foo) disambiguated foo using x's type :-)

view this post on Zulip Emilio Jesús Gallego Arias (Jan 11 2023 at 13:38):

Actually I believe that phase separation in Java is not fully possible due to some lazy class loading, but I don't know the details, indeed that seemed to piss off a lot of people; in the end tools don't care so much, they just want the pieces of data they need computed fast I think

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 18:36):

IIRC separating parsing is fine and you don't even need C's lexer hack, but separating name resolution and typechecking is beyond hopeless

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 18:43):

Basically OO dot notation x.field (even in C) is enough to entangle name lookup for field with typechecking of x. And even Coq clearly has this for _module_ typechecking; I still suspect you _can_ "shape-check" modules for name resolution before core typechecking, but that's much less trivial than I was assuming above.

view this post on Zulip Gaëtan Gilbert (Jan 11 2023 at 18:45):

Program Instance foo : bla := _. Check foo. the second foo refers to the instance only if TC resolution for bla succeeds, otherwise it refers to bla imported from another module

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 18:48):

yes but that's only backward compatibility — I recall Maxime has fixed various such issues, and this one would go too if you could require Program Complete foo.

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 18:50):

maybe that'll never happen, but "backward compatibility" seems a different bucket from intrinsic challenges like the other we were discussing

view this post on Zulip Paolo Giarrusso (Jan 11 2023 at 18:57):

@Emilio Jesús Gallego Arias re "lazy class loading", files can be mutually recursive and have no separate .mli-like interfaces, so you can't typecheck or do name-resolution in each file in isolation, but splitting parsing seems still easy because parsing is not extensible — even Scala has separate parsing IIRC


Last updated: Oct 13 2024 at 01:02 UTC