Stream: VsCoq devs & users

Topic: DEVS: The sentence parser


view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:01):

I'm investigateing the very slow performance on very big files: whenever a "." is inserted near at the start of the document, everything hangs for a few seconds.

I found that the logic from "reparseSentences" apperently does not handle tha ddition of new sentences in the middle of the document well: instead of only giving one sentence to delete/add, it delets all sentences starting from the changed character, and re-adds everything.
There seems to be logic there that should be able to just shift, but I'm not sure if it works. Did anyone had a look at this part before?

view this post on Zulip Enrico Tassi (Nov 20 2020 at 19:04):

FYI together with @Maxime Dénès we are working at a vscoqtop that speaks LSP natively. When this coq toplevel will land that piece of code should be removed, since vscoqtop will receive the entire text and parse it (using Coq parser). So I would not invest too much time fix it now.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:05):

Coq parser

since when do we have a Coq parser?

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:06):

coq itself can parse coq code i guess

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:06):

or do I not get the joke?

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:08):

@Enrico Tassi : i know, but i have a file I want to work on that's 6000 lines long, and I try to look for the least painfull way to work with it. PG is totally unusable and my hope is that fixing vscoq is an option ^^'

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:08):

@Fabian Kunze it's not a joke, properly parsing a Coq file is close to impossible without executing the document.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:09):

It was one of the major TODOs for a coq-LSP server

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:10):

wow, it’s exactly like TeX

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:10):

Slightly better, but indeed even the lexer can change on the fly (i.e. there is no lexer)

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:10):

Oh, I see, on has to execute almost everything that is not a tactic...

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:10):

Sections, includes, notation commands, etc...

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:11):

it’s still a lexer, I’d strongly argue

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:11):

(otherwise C is not lexable)

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:11):

Well, I have doubts about C being lexable.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:12):

that’s terminology, see the “lexer hack” — but in contrast, the concept of “TeX lexer” makes much less sense.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:12):

But yes, Coq and TeX have a lot in common.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:12):

the last famous comparison was for Ltac1 and TeX :-P

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:12):

Coq contains Ltac1, so...

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:13):

back to TeX, it doesn’t have a lexer and parser, it has eyes, a mouth, a throat (?) and a stomach

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:13):

(I probably got one of those wrong, but that’s Knuth’s official terminology)

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:13):

It's a DUCK, indeed.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:13):

Closely related to Coq.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:14):

IIUC, Coq has instead a conventional lexer + LL(1) parser, with some evolving mutable state — that’s MUCH more common

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:14):

Except that the parser is not LL(1), and that the lexer is location-sensitive.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:14):

in my Delft postdoc I tried convincing Eelco Visser to take a look at this, since he has spent his entire career on proper extensible parsing

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:14):

I remember indeed.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:15):

IIUC, in those terms no language has a context-free grammar :-P

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:15):

I mean, we’re talking about Notation, right?

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:15):

About the syntax of a document, so more than notation.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:16):

Hm what else can change, beyond notations from plugins?

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:16):

Right, if you forget plugins then you can only extend stuff via notations.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:17):

But even the core Coq syntax is not LL(1)

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:17):

and plugins are only loadable at the “top-level”, not in expressions

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:17):

CFG?

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:17):

(anyway, I failed that argument, even when Robbert Krebbers was there — the problem size and academia incentives make it completely unsuitable for a research project)

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:17):

Is it? Nothing enforces this, so I'd be on the safe side betting that no.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:18):

Coq parsing is a gift from the nine hells.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:18):

can you even write a non-CFG in Camlp5?

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:18):

beware, I’m excluding the lexer state

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:18):

campl5 is turing-complete as a parsing engine.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:18):

so in this book, C with the lexer hack is context-free

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:19):

we use a lot of lookaheads, so that would be LL(n), but I think we also have a few parts that call arbitrary code.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:19):

and that’s literally true once you get a token stream, even if the lexing needs state from the parser (typenames)

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:19):

even excluding the lexer mess, it's already a parsing delirium

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:19):

okay, sounds like Ruby territory

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:20):

was about to ask “what does CoqIDE do”, but “what’s the next sentence” is indeed easier

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:20):

even that...

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:21):

There is the evergreen quotation bug that was was slain many times and resurfaced alike by being reverted.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:21):

I think Coq’s problems are the best arguments for Haskell-style purely functional programming.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:21):

I am not sure I see the relation.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:21):

right now, that might affect the behavior of “arbitrary code” in the parser — it’d be bad if that were too impure, I guess.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:21):

The mess in the Coq parser is essentially due to organic growth and thus social considerations rather than technical ones.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:22):

but otherwise, my comment was more a conjecture on the other problems with mutable state

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:22):

in the rest of Coq

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:22):

Well, I guarantee you that even if the parser were purely functional, it'd be an utter mess.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:23):

well, I can appreciate the social side from having worked on Scala :-)

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:23):

they’re both large-scale academic projects moving to industry, which causes some software engineering concerns

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:24):

You can throw in the mix weird considerations about why natural language should be preferred for mathematicians.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:24):

at least they got help from having way more funding and few “benevolent dictators”

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:24):

I am actively supporting more totalitarianism in the coq dev.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:25):

The rule of arbitrary is better than no rule at all.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:25):

I don’t think that suffices: a dictator only gives you some consistency — and only as much as they enforce

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:26):

as every PhD student learns, human beings answering questions are not pure functions

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:26):

especially bosses

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:27):

formalizing language design seems to have better results, as shown by Racket and GHC

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:27):

I also agree, but unfortunately there is no clean slate for Coq.

view this post on Zulip Pierre-Marie Pédrot (Nov 20 2020 at 19:27):

Dunno how Lean is faring on that respect.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:28):

AFAIK they still shoo users away, so I’m looking forward to using Lean 25 :-)

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:29):

but that doesn’t prevent proper design for new features, as shown by Ltac2.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:30):

My concern is whether _other_ new features are designed.

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:30):

(“well-designed” will have to wait)

view this post on Zulip Paolo Giarrusso (Nov 20 2020 at 19:31):

(mostly because nobody is sure how to design with dependent types well)

view this post on Zulip Théo Winterhalter (Nov 20 2020 at 19:34):

If we had a structural editor we would only need local parsing at most and the file stored on the computer would have trivial parsing, notations would be handled by the editor. But I guess for now that's just wishful thinking.

view this post on Zulip Théo Winterhalter (Nov 20 2020 at 19:36):

Anyway, @Fabian Kunze I have to say, this is so cool that you are doing all this!

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:36):

I think i have found the problem and just need to get the indices right in my head to fix this :)

view this post on Zulip Fabian Kunze (Nov 20 2020 at 19:37):

the logic currently just checked if the current sentence changed, but not if we added/removed a sentence by adding/removing a dot

view this post on Zulip Théo Winterhalter (Nov 20 2020 at 19:37):

Oh great! Thanks a lot for trying to make our lives better.

view this post on Zulip Fabian Kunze (Nov 20 2020 at 20:04):

I think i fixed it :D

view this post on Zulip Fabian Kunze (Nov 20 2020 at 20:42):

https://github.com/coq-community/vscoq/pull/181
the master branch of my fork has all my fixes and can be build by make afaik

view this post on Zulip Fabian Kunze (Nov 20 2020 at 20:45):

vscoq-0.3.3-alpha.vsix
Oh, or i just upload the file with this and my other fixes here in zulip for anyone interested. (this is https://github.com/fakusb/vscoq/commit/4783ac779ccc9c73c6d4d0a0643b4871b07496fb)

view this post on Zulip Emilio Jesús Gallego Arias (Nov 21 2020 at 00:14):

I think the mess is also of very much technical nature, I agree with @Paolo Giarrusso 's comments

view this post on Zulip Fabian Kunze (Nov 21 2020 at 08:18):

Is deciding what te next entence is easier? Mu passiv view was that, outside of comments, . always denotes the end of a sentence/command.

view this post on Zulip Fabian Kunze (Nov 21 2020 at 08:24):

Ok, nevermind, https://coq.discourse.group/t/is-there-a-full-documentation-of-coqs-grammar/647/10

But as an enduser, i dont care if the supported fragment in the editor would be a bit smaller than what theoretical, the coq parser could handle.

view this post on Zulip Fabian Kunze (Nov 21 2020 at 08:28):

How far is tbe native lsp? Or, probably easier, in which timeframe can we not expexct a verion of vscoq using it?

And is there a way to help developing it? My feeling on the lsp github project was that there are a lot of conceptual problems, but there does not seem to be an implementation that was started, no matter how buggy?

view this post on Zulip Théo Winterhalter (Nov 21 2020 at 08:43):

Fabian Kunze said:

Ok, nevermind, https://coq.discourse.group/t/is-there-a-full-documentation-of-coqs-grammar/647/10

And you also have projections often written with . and full qualid as well (maybe this one isn't a problem).

view this post on Zulip Fabian Kunze (Nov 21 2020 at 08:55):

Yes, i thought the sequence . dot-space (or dot + other whitespace) was special.

view this post on Zulip Théo Winterhalter (Nov 21 2020 at 09:47):

Ah yeah ok.

view this post on Zulip Paolo Giarrusso (Nov 21 2020 at 13:52):

As an end-user, I'd care to achieve at least feature parity with Proof General, except even that is (was?) horribly broken

view this post on Zulip Paolo Giarrusso (Nov 21 2020 at 13:53):

A while ago, and IIRC for a long while, the parsing of nested comments was different in PG and Coq. It wasn't common, but the behavior was pretty bad when you did hit this bug.

view this post on Zulip Paolo Giarrusso (Nov 21 2020 at 13:57):

Ah, this is still open https://github.com/ProofGeneral/PG/issues/37

view this post on Zulip Paolo Giarrusso (Nov 21 2020 at 13:58):

(but it wasn't about nested comments)

view this post on Zulip Théo Zimmermann (Nov 22 2020 at 16:41):

About splitting of sentences and nested comments, I would refer you to Clément's presentation of Alectryon and more specifically, this slide https://youtu.be/f8CKGoP3_us?t=560 and this slide https://youtu.be/f8CKGoP3_us?t=661.

view this post on Zulip Théo Zimmermann (Nov 22 2020 at 16:45):

About "the LSP project", to be fully transparent there is no longer one coordinated project or roadmap, there is:

These efforts are largely independent for now.


Last updated: Jun 04 2023 at 23:30 UTC