I'm investigateing the very slow performance on very big files: whenever a "." is inserted near at the start of the document, everything hangs for a few seconds.
I found that the logic from "reparseSentences" apperently does not handle tha ddition of new sentences in the middle of the document well: instead of only giving one sentence to delete/add, it delets all sentences starting from the changed character, and re-adds everything.
There seems to be logic there that should be able to just shift, but I'm not sure if it works. Did anyone had a look at this part before?
FYI together with @Maxime Dénès we are working at a vscoqtop that speaks LSP natively. When this coq toplevel will land that piece of code should be removed, since vscoqtop will receive the entire text and parse it (using Coq parser). So I would not invest too much time fix it now.
Coq parser
since when do we have a Coq parser?
coq itself can parse coq code i guess
or do I not get the joke?
@Enrico Tassi : i know, but i have a file I want to work on that's 6000 lines long, and I try to look for the least painfull way to work with it. PG is totally unusable and my hope is that fixing vscoq is an option ^^'
@Fabian Kunze it's not a joke, properly parsing a Coq file is close to impossible without executing the document.
It was one of the major TODOs for a coq-LSP server
wow, it’s exactly like TeX
Slightly better, but indeed even the lexer can change on the fly (i.e. there is no lexer)
Oh, I see, on has to execute almost everything that is not a tactic...
Sections, includes, notation commands, etc...
it’s still a lexer, I’d strongly argue
(otherwise C is not lexable)
Well, I have doubts about C being lexable.
that’s terminology, see the “lexer hack” — but in contrast, the concept of “TeX lexer” makes much less sense.
But yes, Coq and TeX have a lot in common.
the last famous comparison was for Ltac1 and TeX :-P
Coq contains Ltac1, so...
back to TeX, it doesn’t have a lexer and parser, it has eyes, a mouth, a throat (?) and a stomach
(I probably got one of those wrong, but that’s Knuth’s official terminology)
It's a DUCK, indeed.
Closely related to Coq.
IIUC, Coq has instead a conventional lexer + LL(1) parser, with some evolving mutable state — that’s MUCH more common
Except that the parser is not LL(1), and that the lexer is location-sensitive.
in my Delft postdoc I tried convincing Eelco Visser to take a look at this, since he has spent his entire career on proper extensible parsing
I remember indeed.
IIUC, in those terms no language has a context-free grammar :-P
I mean, we’re talking about Notation, right?
About the syntax of a document, so more than notation.
Hm what else can change, beyond notations from plugins?
Right, if you forget plugins then you can only extend stuff via notations.
But even the core Coq syntax is not LL(1)
and plugins are only loadable at the “top-level”, not in expressions
CFG?
(anyway, I failed that argument, even when Robbert Krebbers was there — the problem size and academia incentives make it completely unsuitable for a research project)
Is it? Nothing enforces this, so I'd be on the safe side betting that no.
Coq parsing is a gift from the nine hells.
can you even write a non-CFG in Camlp5?
beware, I’m excluding the lexer state
campl5 is turing-complete as a parsing engine.
so in this book, C with the lexer hack is context-free
we use a lot of lookaheads, so that would be LL(n), but I think we also have a few parts that call arbitrary code.
and that’s literally true once you get a token stream, even if the lexing needs state from the parser (typenames)
even excluding the lexer mess, it's already a parsing delirium
okay, sounds like Ruby territory
was about to ask “what does CoqIDE do”, but “what’s the next sentence” is indeed easier
even that...
There is the evergreen quotation bug that was was slain many times and resurfaced alike by being reverted.
I think Coq’s problems are the best arguments for Haskell-style purely functional programming.
I am not sure I see the relation.
right now, that might affect the behavior of “arbitrary code” in the parser — it’d be bad if that were too impure, I guess.
The mess in the Coq parser is essentially due to organic growth and thus social considerations rather than technical ones.
but otherwise, my comment was more a conjecture on the other problems with mutable state
in the rest of Coq
Well, I guarantee you that even if the parser were purely functional, it'd be an utter mess.
well, I can appreciate the social side from having worked on Scala :-)
they’re both large-scale academic projects moving to industry, which causes some software engineering concerns
You can throw in the mix weird considerations about why natural language should be preferred for mathematicians.
at least they got help from having way more funding and few “benevolent dictators”
I am actively supporting more totalitarianism in the coq dev.
The rule of arbitrary is better than no rule at all.
I don’t think that suffices: a dictator only gives you some consistency — and only as much as they enforce
as every PhD student learns, human beings answering questions are not pure functions
especially bosses
formalizing language design seems to have better results, as shown by Racket and GHC
I also agree, but unfortunately there is no clean slate for Coq.
Dunno how Lean is faring on that respect.
AFAIK they still shoo users away, so I’m looking forward to using Lean 25 :-)
but that doesn’t prevent proper design for new features, as shown by Ltac2.
My concern is whether _other_ new features are designed.
(“well-designed” will have to wait)
(mostly because nobody is sure how to design with dependent types well)
If we had a structural editor we would only need local parsing at most and the file stored on the computer would have trivial parsing, notations would be handled by the editor. But I guess for now that's just wishful thinking.
Anyway, @Fabian Kunze I have to say, this is so cool that you are doing all this!
I think i have found the problem and just need to get the indices right in my head to fix this :)
the logic currently just checked if the current sentence changed, but not if we added/removed a sentence by adding/removing a dot
Oh great! Thanks a lot for trying to make our lives better.
I think i fixed it :D
https://github.com/coq-community/vscoq/pull/181
the master branch of my fork has all my fixes and can be build by make
afaik
vscoq-0.3.3-alpha.vsix
Oh, or i just upload the file with this and my other fixes here in zulip for anyone interested. (this is https://github.com/fakusb/vscoq/commit/4783ac779ccc9c73c6d4d0a0643b4871b07496fb)
I think the mess is also of very much technical nature, I agree with @Paolo Giarrusso 's comments
Is deciding what te next entence is easier? Mu passiv view was that, outside of comments, .
always denotes the end of a sentence/command.
Ok, nevermind, https://coq.discourse.group/t/is-there-a-full-documentation-of-coqs-grammar/647/10
But as an enduser, i dont care if the supported fragment in the editor would be a bit smaller than what theoretical, the coq parser could handle.
How far is tbe native lsp? Or, probably easier, in which timeframe can we not expexct a verion of vscoq using it?
And is there a way to help developing it? My feeling on the lsp github project was that there are a lot of conceptual problems, but there does not seem to be an implementation that was started, no matter how buggy?
Fabian Kunze said:
Ok, nevermind, https://coq.discourse.group/t/is-there-a-full-documentation-of-coqs-grammar/647/10
And you also have projections often written with .
and full qualid as well (maybe this one isn't a problem).
Yes, i thought the sequence .
dot-space (or dot + other whitespace) was special.
Ah yeah ok.
As an end-user, I'd care to achieve at least feature parity with Proof General, except even that is (was?) horribly broken
A while ago, and IIRC for a long while, the parsing of nested comments was different in PG and Coq. It wasn't common, but the behavior was pretty bad when you did hit this bug.
Ah, this is still open https://github.com/ProofGeneral/PG/issues/37
(but it wasn't about nested comments)
About splitting of sentences and nested comments, I would refer you to Clément's presentation of Alectryon and more specifically, this slide https://youtu.be/f8CKGoP3_us?t=560 and this slide https://youtu.be/f8CKGoP3_us?t=661.
About "the LSP project", to be fully transparent there is no longer one coordinated project or roadmap, there is:
vscoqtop
project on which Maxime and Enrico are working, whose goal is to provide the current LSP protocol that VSCoq expects;These efforts are largely independent for now.
Last updated: Jun 04 2023 at 23:30 UTC