STM api · SerAPI · Zulip Chat Archive

Stream: SerAPI

Topic: STM api

Alex Sanchez-Stern (Jan 19 2023 at 21:28):

Was told in the general chat this message might be better here:

Hey all,
I have a question about plans for future development of Coq/Serapi (hope this is the right stream for that). In coq-serapi currently, states are numbered. When you add a statement, you have to specify what state you're adding it on top of, and it gives you a state number. But currently, it seems like you can only add statements to the "active" state returned by the last interaction. This seems like an incomplete design; if there's only one valid state number to pass to (Add () "..."), then it doesn't make sense to have the user specify it instead of it being inferred. But if there are plans to in the future allow users to "jump" to previously visited states and add statements on top of them, then it makes sense to have the state numbers in the API now. Are there plans to allow that kind of jumping? If so, is there some sort of website or statement somewhere that outlines those plans, or is it more of a "cite the zulip" kind of thing?
Thanks,
Alex Sanchez-Stern

Emilio Jesús Gallego Arias (Jan 19 2023 at 21:52):

Hi @Alex Sanchez-Stern , thanks for asking; indeed there are plans to remedy that problem, in short, SerAPI is being deprecated and we recommend users to switch to coq-lsp, which can perform these kind of tasks and moreover efficiently!

There is more information in a quite long issue here https://github.com/ejgallego/coq-serapi/issues/252

Emilio Jesús Gallego Arias (Jan 19 2023 at 21:54):

But basically, SerAPI is just a thin layer of Coq's document API, which is a bit dated and has some tricky to solve problems so we decided for a rewrite; coq-lsp uses a new document manager (Fleche) and can do all that SerAPI does and more

Alex Sanchez-Stern (Jan 19 2023 at 22:39):

Thanks Emilio! Hmm okay. I read the issue and discussion, as well as the coq-lsp github, but it's still not clear to me what the plans are on this particular feature.

Basically, I've been exploring different proof strategies for proof search, including ones like best-first and A* that use state-evaluation heuristics. These strategies necessitate jumping around in the tree of explored proof states. So, for instance, I might run tac1. tac2. tac3., then cancel them all and run tac4. tac5. tac6., then want to return to the result of running tac1. tac2. tac3. and add tac7. on top of that, without having to cancel tac4, tac5, and tac6, and rerun tac1, tac2, and tac3. This is because running tac1, tac2, and tac3 again could be computationally expensive, and the results were already computed.

Most of the stuff on the coq-lsp README seems to be centered around making document editing more responsive, where you wouldn't really need this feature. So, is coq-lsp going to support jumping to previously cancelled states and running tactics on them without re-running the tactics that it takes to get there?

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:51):

Thanks to you for getting in touch @Alex Sanchez-Stern , I'm sorry the documentation is not better; thanks for the feedback.

For your use case, you can use coq-lsp very easily, simply send the new document to coq-lsp, then some request for information about the document, and you are set. Keep in mind that one of the pitfalls of the serapi is that it assumed all sentences could be parsed, that's not the case for coq-lsp.

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:51):

So in your case, you just send tac1. tac2. tac3. to coq-lsp (with their full document)

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:52):

then you submit a query for the goals after tac3. for example, then coq-lsp will compute such information incrementally, (and right now eagerly but it is designed to be switched to lazy mode for clients that support it)

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:53):

for your edit, you don't need to cancel anything, simply submit a new version of the document with tac4. tac5. tac6. , coq-lsp will abort the previous checking and recompute only what's needed in the new one (incrementally)

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:54):

Alex Sanchez-Stern said:

So, is coq-lsp going to support jumping to previously cancelled states and running tactics on them without re-running the tactics that it takes to get there?

Yes, it already supports that.

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:55):

And it is transparent to the user, you just submit a new document and that happens.

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:55):

The main difference is that now, when you are asking things about the document, instead of a sid you want to send a Position object, which is basically a line/column point

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:56):

Emilio Jesús Gallego Arias said:

And it is transparent to the user, you just submit a new document and that happens.

Moreover, coq-lsp will give you information about what was cached, so you can quickly spot any trouble in the incremental computation.

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:58):

coq-lsp is build on top of a generic incremental document checking called Flèche, coq-lsp is just a thin layer over that. The engine supports more features that you can access via LSP, for example speculative execution, etc... which you may find useful.

Emilio Jesús Gallego Arias (Jan 19 2023 at 22:58):

But LSP is a good front-end for many users (plenty of JS and Python tooling for example)

Alex Sanchez-Stern (Jan 19 2023 at 23:21):

Great, that sounds like exactly what I need! One more question, just because I'm curious, what are the requirements on the position object? In my search tools, they don't use a text buffer or anything, so they don't natively have any notion of a document position. Does it work like state ID's, where any document position I need will have been returned from some previous command? Or will I have to synthesize document positions sometimes by counting the length of the tactic I'm submitting and deciding where to put newlines?

Alex Sanchez-Stern (Jan 19 2023 at 23:25):

Oh and I guess another question: it sounds like from your description, the result of all previously run commands is cached, even if you backtracked over it to run a different set of commands. During search where hundreds or thousands of tactics are tried per-proof, that could potentially be a lot of different proof states stored in the cache. Do the cached states take up enough memory that it could be a problem if I were to say, visit 100,000 different states in a particular file? If so, is there a mechanism to tell the server "I finished search for this proof, you can drop all the states not part of the successful trace now while I search for the next one"?

Paolo Giarrusso (Jan 19 2023 at 23:37):

for your edit, you don't need to cancel anything, simply submit a new version of the document with tac4. tac5. tac6. , coq-lsp will abort the previous checking and recompute only what's needed in the new one (incrementally)

I'm sorry to jump in, but isn't "abort" a poor fit for this kind of "proof search over documents"?

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:39):

Alex Sanchez-Stern said:

Great, that sounds like exactly what I need! One more question, just because I'm curious, what are the requirements on the position object? In my search tools, they don't use a text buffer or anything, so they don't natively have any notion of a document position. Does it work like state ID's, where any document position I need will have been returned from some previous command? Or will I have to synthesize document positions sometimes by counting the length of the tactic I'm submitting and deciding where to put newlines?

You don't need to use newlines (tho it doesn't hurt due to coq-lsp having to handle UTF-8 positions for Coq, and the conversion cache is line based)

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:39):

But indeed here the primary object you as client handle is a text document, so position mean a line / column pair

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:40):

So yes, you need to synthetize a document / document position

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:41):

I'm happy to add some other address method, but text is IMHO the best

Paolo Giarrusso (Jan 19 2023 at 23:42):

(nevermind, it sounds like Flèche might have enough extra flexibility; that point just caught my attention)

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:42):

During search where hundreds or thousands of tactics are tried per-proof, that could potentially be a lot of different proof states stored in the cache. Do the cached states take up enough memory that it could be a problem if I were to say, visit 100,000 different states in a particular file? If so, is there a mechanism to tell the server "I finished search for this proof, you can drop all the states not part of the successful trace now while I search for the next one"?

There is no cache management mechanism as of yet, but I'm happy to add one. For now the cache grows for the full lifetime of the server, but indeed we can pass some heurisitics to drop state.

Note tho that caching can be surprising in Coq due to sharing, so it usually performs better than what we expect

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:43):

Paolo Giarrusso said:

(nevermind, it sounds like Flèche might have enough extra flexibility; that point just caught my attention)

Not sure what you mean Paolo; as of today (that's not in Fleche, but in the LSP part), when a new version of a document appears, we stop the checking of the previous version and proceed to check the new one.

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:43):

But that's a policy decision by the LSP controller, that's very easy to tweak, Flèche has not problem checking several versions of a document simultaneously

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:44):

In fact it is lower level, if you look at the current API

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:44):

https://github.com/ejgallego/coq-lsp/blob/7d3deb027a5fd35b6f37ce5c5adb7888c6101e9a/fleche/doc.mli#L104

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:50):

there are 3 basic operations:

create: creates new document
bump_version: main incremental entry point, will analyze the new text, and decide what can be reused of the nodes cache
check: checks a document target, including the ability to resume stopped documents

the key thing is that we have two caches: document info cache and execution cache

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:52):

they are slightly different but essential on their own for different purposes, the info cache follows the incrementality that the structure of a document poses (in this case just prefix-based, until we add structure which is on the TODO)

Emilio Jesús Gallego Arias (Jan 19 2023 at 23:53):

the execution cache handles Coq states and it allow us to implement memoized error recovery for example, which is very useful in some contexts

Paolo Giarrusso (Jan 20 2023 at 00:45):

Indeed, sounds like Flèche can do all that's desired

Emilio Jesús Gallego Arias (Jan 20 2023 at 01:05):

Paolo Giarrusso said:

Indeed, sounds like Flèche can do all that's desired

I'd bet that for the kind of Search Alex wants to perform you'd indeed would like to have deeper integration than what the current API allows.

But going back to the original question, Flèche / coq-lsp is designed to do a lot of things that SerAPI couldn't do because it was limited by the STM, so now it should be possible to much better cover the use cases of people willing to do such stuff.

Karl Palmskog (Jan 20 2023 at 12:06):

it's pretty weird to me that there must be an underlying document with locations for code to interact with the document manager. I'd expect there to be tons of client to any Coq protocol that have no underlying files, but are inputting data based on some model or other.

Emilio Jesús Gallego Arias (Jan 20 2023 at 15:14):

@Karl Palmskog the document doesn't have to be on disk, but indeed the first model the document manager has of the document is (for now) its text content. Indeed that is a bit annoying if your primary setup is one that is working with complex Ast like in Coq.

One reason the primary interface is based on text, is that ultimately, for tools that may propose mutations or linting fixes, they must propose such updates as text so ultimately the user can incorporate them into their development.

I guess there would not be a huge problem to add a "node-based" interface to Flèche, but applications that need it, but when the clients are UI, the node model is a hell for them, as they need to be aware of a custom structure, whereas in the text model they can just rely on their native text structure (let it be offset based with ProseMirror or line/column like VS Code)

Alex Sanchez-Stern (Jan 20 2023 at 19:41):

It looks like PyCoq (https://github.com/ejgallego/pycoq) hews pretty closely to the STM api, are there plans to move that to the coq-lsp api?

Emilio Jesús Gallego Arias (Jan 20 2023 at 20:11):

@Alex Sanchez-Stern not sure what will happen with PyCoq, I haven't gotten any feedback from users so I'm unsure what they are using that for, but yes, it could be moved to the Flèche API.

Or you can use directly coq-lsp using LSP clients in Python.

I know this is confusing let me remind that:

Flèche is what replaces the STM, is a document manager with an API to check and ask for info about set of Coq documents
LSP is a protocol that is implemented on top of Flèche, but it only exposes a subset of Flèche API
coq-lsp provides LSP + Coq-specific extensions, but the LSP support is pretty reasonable so clients can work just using the standard, except for goal display.

Emilio Jesús Gallego Arias (Jan 20 2023 at 20:12):

So, pyCoq makes sense when you want to bind directly to Flèche and avoid some overhead as Coq will run in the same address space.

Or if you are happy with the LSP + extensions protocol exposed in coq-lsp you can just use a regular LSP client to talk to it.

Alex Sanchez-Stern (Jan 20 2023 at 20:52):

Ah yeah. I imagine a lot of potential PyCoq users are in a similar
position to where I am, where they already have relatively mature
coq-interfacing APIs that predate PyCoq, so they're waiting for the
project to be a bit more mature (handling errors correctly, not marked
as "extremely experimental") before trying to use it. Obviously this
isn't very helpful, because without feedback from those very same
users, its hard to know which parts to develop. But alas, university
funding and publish-or-perish makes it hard to justify investing time
in porting large codebases to a new API to provide feedback, when
there are already more mature (if much less clean and long-term
engineered) solutions.

Interaction overhead seems to be a large part of the runtime of my
proof search stuff, so I would love to eventually move to something
like PyCoq that avoids more overhead.

You mentioned that LSP doesn't support the displaying of goals, and so
that is a Coq-specific extension. Does that mean that goals within
coq-lsp are still output in a similar format to which they were in
coq-serapi?

Emilio Jesús Gallego Arias (Jan 21 2023 at 22:32):

Yes, goals by Coq LSP are in similar format.

Emilio Jesús Gallego Arias (Jan 21 2023 at 22:33):

In a sense coq-lsp is SerAPI 2, but instead of SEXP it uses a more standard interface + extensions.

Emilio Jesús Gallego Arias (Jan 21 2023 at 22:34):

Interaction overhead seems to be a large part of the runtime of my proof search stuff

Really curious to hear more about what the overhead in your stuff is (and happy to help to reduce it if possible)

Alex Sanchez-Stern (Jan 22 2023 at 19:44):

In the past, a significant portion of overhead has come about when deep in proof search the proof obligations start getting really big (lots of hypotheses, large goal terms, etc). Then, parsing the sexpressions corresponding to those large contexts in Python can take a signifcant portion of time. Other than that, search time is dominated by the time it takes Coq to run a tactic, but in those states even simple tactics like auto are slow, because they involve parsing large s expressions many times.

Last updated: Apr 19 2024 at 20:01 UTC