Stream: Coq devs & plugin devs

Topic: Splitting CoqIDE - speed regression


view this post on Zulip Michael Soegtrop (Mar 09 2022 at 10:58):

I followed @Guillaume Melquiond 's advice and run the Pff.v file to the end in CoqIDE 8.12, 8.13, 8.14 and 8.15. The results (roughly, stopped by hand while typing so +0/-10s) so far:

8.12: 3'40"
8.13: 2'40"
8.14: 2'00"
8.15: at 11'30" and running - I don't have the impression it will finish before lunch.

Note that all are with the same OCaml 4.10.2 (all are Coq Platform 2022.01). I guess this deserves a bug report ...

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 11:14):

Btw.: In VsCoq it doesn't do anything at all - I can't get into proof mode. Can one of the experts look into this?

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 11:26):

8.15 is at 40 minutes and running ...

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 11:29):

interesting...

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 11:45):

Maybe a side-effect of #15467? CoqIDE seems to be struggling with byte_off_to_buffer_off

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:01):

(quadratic behaviours kill)

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:08):

This mess is due to a design point I've been advocating against again and again and again, namely representing breakpoints as locations. Not only this does not make sense, but it also forces unfortunate interfaces and inversion of control.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:09):

But as usual, nobody listened to these deep issues and "pragmatism" won.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:10):

We can probably mitigate the issue keeping a local table of utf8-vs-byte correspondence in the UI but it's not far from reimplementing the missing bits of breakpoints-to-locations table.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:10):

(which is the worst of both worlds)

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:11):

What was the alternative to breakpoints as locations?

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:11):

breakpoints as uid, and Coq tells you when there are new available breakpoints with message

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:11):

not the other way around

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:12):

currently the breakpoint mechanism is indirectly embedded IN THE FRIGGING PARSER

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:12):

This isn't to do with the 8.15 debugger tho right? This was a poor design choice made for Set Ltac Debug?

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:13):

it totally has to do with the 8.15 debugger

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:13):

before there were abstract breakpoints but suddently you needed to make them available to the UI (this is OK)

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:14):

since we want a "visual" debugger you now need to match the code position with the closure compilation (questionable, but somewhat reasonable)

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:15):

but the uid of a breakpoint is now its location, so you need to match what the UI believes is a breakpoint with what Coq believes

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:21):

Do you know which PR this change took place? I am trying to reread the discussion

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:22):

The one debugger PR

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:22):

I found a few that's why I am asking

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:22):

https://github.com/coq/coq/pull/13783

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:22):

This one?

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:23):

(the slowness was probably introduced by #15467 because it's relying on a O(n) conversion function inside a O(n) loop, but the problem is deeper)

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:23):

yes

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:23):

or this one https://github.com/coq/coq/pull/14644

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:23):

the first one

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:24):

The first one went into 8.14 tho

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:24):

hm wait indeed

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:24):

the problematic lines for slowness are in the other one

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:24):

(the design is broken from the beginning)

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:25):

https://github.com/coq/coq/pull/14644/files#diff-fafed4ce08773fac3ff5f1fe0b7788e45380b85bbef38730bec8a8e34bde6115R707 are the root of the quadratic behaviour

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:25):

we send offsets to Coq to match the locations with the CoqIDE lines

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:26):

this version of the code is incorrect because it's sending UTF8 locations but Coq expects bytes

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:26):

so there is a subsequent PR that does the conversion in O(n) and that's boom

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:27):

you can solve the problem by trying to be cleverer in the conversion but the very root of the problem is that we have to send this information to Coq in the first place

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:27):

utter nonsense, if you ask me

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:28):

this is confusing the document with the textual representation in the user-facing world

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:28):

(which may not even exist in general)

view this post on Zulip Ali Caglayan (Mar 09 2022 at 12:34):

There is also https://github.com/coq/coq/pull/15532 alongside https://github.com/coq/coq/pull/15467

view this post on Zulip Guillaume Melquiond (Mar 09 2022 at 12:38):

I see. Indeed, buffer_off_to_byte_off is a disaster.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 12:40):

We can mitigate the slowness by using the low-level GTK iter API but it's still quadratic

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 13:01):

For the records: the run time with CoqIDE 8.15 of Guillaume's Pff file is between 55' and 1h25' - I went to lunch in between and it stopped in that time.

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 13:03):

coqc takes 51'' real time on the same file.

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 13:04):

I wonder if this should be fixed for 8.15.1.

@Gaëtan Gilbert : what is your opinion on this?

view this post on Zulip Gaëtan Gilbert (Mar 09 2022 at 13:10):

depends when a fix gets merged
we can always have 15.2

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 13:12):

it's hard to fix without spaghettifying CoqIDE

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 13:12):

we need an efficient way to keep track of the byte-codepoint equivalence

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 13:18):

I created https://github.com/coq/coq/issues/15786.

I should also note that for "normal" files it is not unbearable, but quite noticeable. E.g. a file with about 100 definitions and lemmas takes about 10..20s to parse. I cannot remember that the parsing ever took more than a few seconds.

view this post on Zulip Guillaume Melquiond (Mar 09 2022 at 13:58):

Pierre-Marie Pédrot said:

we need an efficient way to keep track of the byte-codepoint equivalence

Is that necessary? If we were using pairs of (line number, index wrt to line start) instead of the index wrt file start, we would no longer be quadratic when trying to locate a sentence. Obviously, if the users are writing megabyte-long lines of code, this will still explode. But, as far as I know, no existing editor, be it Emacs or VSCode, supports this use case anyway.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 13:59):

but you need to adapt the code to handle that

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 13:59):

and it's spaghetti all the way down

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 13:59):

(hence the "we should have used a uid for breakpoints from the beginning")

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 14:00):

this location is sent to the parser, and used by the tactic debugger to locate pieces of AST, and then sent back to coqide

view this post on Zulip Emilio Jesús Gallego Arias (Mar 09 2022 at 14:27):

@Pierre-Marie Pédrot I think we did agree on this right? I propose indeed we do the proper fix; maybe I got lost but I thought we had done a DAP-style protocol in the end

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 14:28):

we're totally using the location as the identifier for the breakpoint, see Tactic_debug

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 14:29):

a bp is a pair (file, offset)

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 14:31):

(we can use line file, line number, relative offset but then I see issues with Windows line ending knocking on the door already)

view this post on Zulip Guillaume Melquiond (Mar 09 2022 at 16:14):

Unless you are envisioning negative offsets, line ending should cause no issue, since by definition they are at the end of a line.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 16:25):

I was more afraid about files mixing conventions.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 16:45):

@Michael Soegtrop Where can I find Pff.v? Or just use any large file?

view this post on Zulip Guillaume Melquiond (Mar 09 2022 at 16:47):

The link is in the bug report (and in this discussion). https://gitlab.inria.fr/flocq/flocq/-/raw/master/src/Pff/Pff.v
But any large file should do the trick.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:24):

First thoughts: If buffer_off_to_byte_off(aka c2b in coqops.ml) causes the problem, then issue is from https://github.com/coq/coq/pull/15467. If the user is not going to use debugger breakpoints at all in the file, then bp, line_nb and bol_poscan be zero and there's no need to do the conversion. But if the user has Set Ltac Debug. in the middle of the file, all code above that point would need to be reprocessed so Coq has the correct offsets. A possible quick fix is to require the user to enable the debugger from the GUI menu--which could reset the buffer to the beginning. Definitely a user visible change. Do you think this may be a reasonable short-term solution?

            let bp = c2b buffer start#offset in
            let line_nb = start#line + 1 in
            let bol_pos = c2b buffer start#line_offset in
            let coq_query = Coq.add ((((phrase,edit_id),(tip,verbose)),bp),(line_nb,bol_pos)) in

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 17:25):

this will break the error locations won't it?

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:28):

I think so.

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 17:30):

IMHO for a quick fix in 8.15.1 it would be acceptable to require that "Set Ltac Debug" is at the beginning of the file and throw an error if it is not.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:31):

I wonder if doing the conversion on the server would help. Haven't thought that through at all.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:32):

@Michael Soegtrop I think the error locations would still be broken even if you don't use the debugger.

view this post on Zulip Michael Soegtrop (Mar 09 2022 at 17:33):

I see. I fear I can't help here on the technical side - just state user expectations.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 17:35):

We can try to precompute the table before the loop to make it O(n) instead of O(n²) but I don't know if we're sure that no signal can mingle with the buffer.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:37):

That sounds plausible.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:40):

I want to spend some time reviewing how the debugger changes required the fix in #15467, perhaps there are other possibilities.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:53):

How about something like this? Remember the converted offset for the last sentence processed. Then computing the converted offset for the next sentence only has to examine the sentence rather than the whole file. No cache, just one offset to remember.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 17:54):

this is inside a call to coqtop, so the buffer can change under your feet

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 17:54):

I checked this

view this post on Zulip Jim Fehrle (Mar 09 2022 at 17:57):

How about if we put a mark in the buffer? If its offset isn't changed, do it the faster way.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 17:58):

might work but probably not always correct

view this post on Zulip Gaëtan Gilbert (Mar 09 2022 at 17:59):

Pierre-Marie Pédrot said:

(we can use line file, line number, relative offset but then I see issues with Windows line ending knocking on the door already)

what issues? as long as coqide and coqidetop see the same thing there should be no problem

view this post on Zulip Jim Fehrle (Mar 09 2022 at 18:00):

How would the buffer change under your feet? If the sentence is being processed, it's read-only.

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 18:00):

the parser needs to handle the line endings in the same way

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 18:01):

@Jim Fehrle that's true

view this post on Zulip Pierre-Marie Pédrot (Mar 09 2022 at 18:02):

but can't you modify a line further down that has not been processed ?

view this post on Zulip Jim Fehrle (Mar 09 2022 at 18:02):

@Pierre-Marie Pédrot I recall making a change specifically to handle return characters in the file some time ago, I think @Gaëtan Gilbert is probably right. Would need to check though, it's been a while.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 18:03):

Yes, you can modify unprocessed stuff below, but you won't be getting errors on it.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 18:09):

I think that would work--so far I can't think why it wouldn't. Do you agree?

view this post on Zulip Jim Fehrle (Mar 09 2022 at 18:11):

Indeed, the mark may not be needed. The last conversion offset may have to be in the session object.

view this post on Zulip Jim Fehrle (Mar 09 2022 at 18:16):

I'll try that later today if I don't hear from you. Let's not both write code for this.


Last updated: Feb 05 2023 at 20:03 UTC