I followed @Guillaume Melquiond 's advice and run the Pff.v file to the end in CoqIDE 8.12, 8.13, 8.14 and 8.15. The results (roughly, stopped by hand while typing so +0/-10s) so far:
8.12: 3'40"
8.13: 2'40"
8.14: 2'00"
8.15: at 11'30" and running - I don't have the impression it will finish before lunch.
Note that all are with the same OCaml 4.10.2 (all are Coq Platform 2022.01). I guess this deserves a bug report ...
Btw.: In VsCoq it doesn't do anything at all - I can't get into proof mode. Can one of the experts look into this?
8.15 is at 40 minutes and running ...
interesting...
Maybe a side-effect of #15467? CoqIDE seems to be struggling with byte_off_to_buffer_off
(quadratic behaviours kill)
This mess is due to a design point I've been advocating against again and again and again, namely representing breakpoints as locations. Not only this does not make sense, but it also forces unfortunate interfaces and inversion of control.
But as usual, nobody listened to these deep issues and "pragmatism" won.
We can probably mitigate the issue keeping a local table of utf8-vs-byte correspondence in the UI but it's not far from reimplementing the missing bits of breakpoints-to-locations table.
(which is the worst of both worlds)
What was the alternative to breakpoints as locations?
breakpoints as uid, and Coq tells you when there are new available breakpoints with message
not the other way around
currently the breakpoint mechanism is indirectly embedded IN THE FRIGGING PARSER
This isn't to do with the 8.15 debugger tho right? This was a poor design choice made for Set Ltac Debug?
it totally has to do with the 8.15 debugger
before there were abstract breakpoints but suddently you needed to make them available to the UI (this is OK)
since we want a "visual" debugger you now need to match the code position with the closure compilation (questionable, but somewhat reasonable)
but the uid of a breakpoint is now its location, so you need to match what the UI believes is a breakpoint with what Coq believes
Do you know which PR this change took place? I am trying to reread the discussion
The one debugger PR
I found a few that's why I am asking
https://github.com/coq/coq/pull/13783
This one?
(the slowness was probably introduced by #15467 because it's relying on a O(n) conversion function inside a O(n) loop, but the problem is deeper)
yes
or this one https://github.com/coq/coq/pull/14644
the first one
The first one went into 8.14 tho
hm wait indeed
the problematic lines for slowness are in the other one
(the design is broken from the beginning)
https://github.com/coq/coq/pull/14644/files#diff-fafed4ce08773fac3ff5f1fe0b7788e45380b85bbef38730bec8a8e34bde6115R707 are the root of the quadratic behaviour
we send offsets to Coq to match the locations with the CoqIDE lines
this version of the code is incorrect because it's sending UTF8 locations but Coq expects bytes
so there is a subsequent PR that does the conversion in O(n) and that's boom
you can solve the problem by trying to be cleverer in the conversion but the very root of the problem is that we have to send this information to Coq in the first place
utter nonsense, if you ask me
this is confusing the document with the textual representation in the user-facing world
(which may not even exist in general)
There is also https://github.com/coq/coq/pull/15532 alongside https://github.com/coq/coq/pull/15467
I see. Indeed, buffer_off_to_byte_off
is a disaster.
We can mitigate the slowness by using the low-level GTK iter API but it's still quadratic
For the records: the run time with CoqIDE 8.15 of Guillaume's Pff file is between 55' and 1h25' - I went to lunch in between and it stopped in that time.
coqc takes 51'' real time on the same file.
I wonder if this should be fixed for 8.15.1.
@Gaëtan Gilbert : what is your opinion on this?
depends when a fix gets merged
we can always have 15.2
it's hard to fix without spaghettifying CoqIDE
we need an efficient way to keep track of the byte-codepoint equivalence
I created https://github.com/coq/coq/issues/15786.
I should also note that for "normal" files it is not unbearable, but quite noticeable. E.g. a file with about 100 definitions and lemmas takes about 10..20s to parse. I cannot remember that the parsing ever took more than a few seconds.
Pierre-Marie Pédrot said:
we need an efficient way to keep track of the byte-codepoint equivalence
Is that necessary? If we were using pairs of (line number, index wrt to line start) instead of the index wrt file start, we would no longer be quadratic when trying to locate a sentence. Obviously, if the users are writing megabyte-long lines of code, this will still explode. But, as far as I know, no existing editor, be it Emacs or VSCode, supports this use case anyway.
but you need to adapt the code to handle that
and it's spaghetti all the way down
(hence the "we should have used a uid for breakpoints from the beginning")
this location is sent to the parser, and used by the tactic debugger to locate pieces of AST, and then sent back to coqide
@Pierre-Marie Pédrot I think we did agree on this right? I propose indeed we do the proper fix; maybe I got lost but I thought we had done a DAP-style protocol in the end
we're totally using the location as the identifier for the breakpoint, see Tactic_debug
a bp is a pair (file, offset)
(we can use line file, line number, relative offset but then I see issues with Windows line ending knocking on the door already)
Unless you are envisioning negative offsets, line ending should cause no issue, since by definition they are at the end of a line.
I was more afraid about files mixing conventions.
@Michael Soegtrop Where can I find Pff.v
? Or just use any large file?
The link is in the bug report (and in this discussion). https://gitlab.inria.fr/flocq/flocq/-/raw/master/src/Pff/Pff.v
But any large file should do the trick.
First thoughts: If buffer_off_to_byte_off
(aka c2b
in coqops.ml
) causes the problem, then issue is from https://github.com/coq/coq/pull/15467. If the user is not going to use debugger breakpoints at all in the file, then bp
, line_nb
and bol_pos
can be zero and there's no need to do the conversion. But if the user has Set Ltac Debug.
in the middle of the file, all code above that point would need to be reprocessed so Coq has the correct offsets. A possible quick fix is to require the user to enable the debugger from the GUI menu--which could reset the buffer to the beginning. Definitely a user visible change. Do you think this may be a reasonable short-term solution?
let bp = c2b buffer start#offset in
let line_nb = start#line + 1 in
let bol_pos = c2b buffer start#line_offset in
let coq_query = Coq.add ((((phrase,edit_id),(tip,verbose)),bp),(line_nb,bol_pos)) in
this will break the error locations won't it?
I think so.
IMHO for a quick fix in 8.15.1 it would be acceptable to require that "Set Ltac Debug" is at the beginning of the file and throw an error if it is not.
I wonder if doing the conversion on the server would help. Haven't thought that through at all.
@Michael Soegtrop I think the error locations would still be broken even if you don't use the debugger.
I see. I fear I can't help here on the technical side - just state user expectations.
We can try to precompute the table before the loop to make it O(n) instead of O(n²) but I don't know if we're sure that no signal can mingle with the buffer.
That sounds plausible.
I want to spend some time reviewing how the debugger changes required the fix in #15467, perhaps there are other possibilities.
How about something like this? Remember the converted offset for the last sentence processed. Then computing the converted offset for the next sentence only has to examine the sentence rather than the whole file. No cache, just one offset to remember.
this is inside a call to coqtop, so the buffer can change under your feet
I checked this
How about if we put a mark in the buffer? If its offset isn't changed, do it the faster way.
might work but probably not always correct
Pierre-Marie Pédrot said:
(we can use line file, line number, relative offset but then I see issues with Windows line ending knocking on the door already)
what issues? as long as coqide and coqidetop see the same thing there should be no problem
How would the buffer change under your feet? If the sentence is being processed, it's read-only.
the parser needs to handle the line endings in the same way
@Jim Fehrle that's true
but can't you modify a line further down that has not been processed ?
@Pierre-Marie Pédrot I recall making a change specifically to handle return characters in the file some time ago, I think @Gaëtan Gilbert is probably right. Would need to check though, it's been a while.
Yes, you can modify unprocessed stuff below, but you won't be getting errors on it.
I think that would work--so far I can't think why it wouldn't. Do you agree?
Indeed, the mark may not be needed. The last conversion offset may have to be in the session object.
I'll try that later today if I don't hear from you. Let's not both write code for this.
Last updated: Jun 04 2023 at 19:30 UTC