I'm trying to understand the async proof processing of Coq better, and how it works with regard to different classification options of commands. In particular, I'd appreciate any insights into the following questions:
Extraction
and Print
do, but Check
does not. Are some of these commands special-cased in the codebase? Or is there a more fine-grained way to specify which state a command accesses?vernac_sideff_type
are for commands like Definition
and Lemma
to declare which names they register? What exactly is the semantics of VtNow
and VtLater
?VtModifyProof
command as either VtQuery
or VtSideff
? Are such combinations valid? (I know the second one isn't, but I'm not sure about the first.) Similarly, what happens if in a VtCloseProof
command, side-effects other than saving the lemma are performed?(I'm asking these questions in relation to this breakout session we had during the CUDW: https://github.com/coq/coq/wiki/BreakOut-2020-11-30-Lasse)
Thanks!
Regarding Extraction
, Print
, and Check
, it is a matter of whether a command can access an opaque proof. This is hardcoded in Coq.
Regarding VtNow
and VtLater
, it indicates whether the command modifies the parser and thus can change the meaning of subsequent commands.
@Guillaume Melquiond Thanks for your answers and sorry for the late reaction (I cannot figure out how to get Zulip to send me emails only of relevant threads). Do you know whom I might ping for the remaining questions here?
Presumably @Emilio Jesús Gallego Arias and perhaps @Enrico Tassi too.
I can answer all this, but I'm not so sure I understand what you are trying to do, especially for the questions which have no answer yet. The general picture of all this is here https://hal.inria.fr/hal-01135919 . The "names" are there for the implementation of sentences that should not even be allowed in documents, such as Reset <name>
. Lemma terminating commands are not really extensible, even if the classification function lets you think so, and in general misclassifying commands will just not work.
The breakout session is not really helping me, since it is too low level. You want to gather data? In batch mode? In interactive mode? Should it be used/available in batch/interactive? To Coq? To the UI?
I need to understand better the lifetime of the data you are interested in.
High-level, the issue we were trying to solve during the breakout was as follows: Currently, Tactician is implemented such that all tactics are classified as side-effecting. This is a hugely evil hack (because by doing this, tactics are replayed a second time at QED-time and I have to do some crazy things to make that work). We were trying to resolve this. The proposal was to create a patch for Coq that allows (1) at Proof-time to move data from summaries to Evd.Store.field
's and (2) at Qed-time move data back from fields to summaries.
I was trying to create a first patch, but I think there may be an issue with this, because we are now (1) performing queries when starting a proof and (2) making (summary) side-effects during the VtCloseProof
command. I'd like to know if that violates the async model.
An additional issue is that because every proof opening is now classified as a query, they (presumably) would all create async barriers, which basically completely disables async. It occurs to me that I don't actually always need the data from my summaries in proof mode, but rather only when a proof contains the search
tactic. So I'm wondering what would happen if I classify the search
tactic as a query. Would that break the async model? (I'm guessing yes.) Alternatively, could I make a Search
command implemented as a VtModifyProof
command that is classified as VtQuery
. Would that be valid?
Please tell me what you need, not how you thought to code it. Fiddling with classification is not a good idea, similarly evar-specific data is tricky, I hope it is not the only way to go.
What I need concretely, is to read and write to summaries at some points while in proof mode:
search
tactic. Either that needs to happen at the entry-point of the tactic (what I'm currently doing) or alternatively, it could happen at proof start time. Another less desirable option could be to change search
into a command Search
that is both a query and modifies the proof.Note that I'm already storing much information in Evd.Store.field
's for other reasons, and that is working fine. So that should not be a problem.
A tactic can read/write to summaries. Maybe what you don't like is that changes done inside a proof are not kept by Qed.
Now, is the data to be read/write by tactic atoms or just by the whole expressions? (eg tac1 ++ tac2
, do you need the data of tac1, tac2 or both?)
Yes, indeed I need it to persist past Qed. The writing can be at any point during the proof, since I'm already keeping the data in Evd.Store.field
's. At the moment the writing happens at the granularity of whole tactic sentences.
More about Qed: the proof may be run in background, so you can't have Qed need that data.
Is "data gathering" a commutative operation? (if you swap two proofs, what happens?)
Whether or not it is commutative depends on the particular version of Tactician. But for now, let's assume that it is indeed commutative.
The other thing is, do you need your state to be aware on Undo or not?
Yes, it needs to synchronize with the interactive document
Imagine your state is not a summary (hence undoing a sentence has no effect on it). Is addition idempotent?
Can you say a bit more why?
Imagine your state is not a summary (hence undoing a sentence has no effect on it). Is addition idempotent?
Not at the moment, but perhaps it could be in the future
Can you say a bit more why?
My state needs to be the same in batch mode and interactive mode. Otherwise a file that compiled during your interactive explorations might not compile in coqc
.
Hum, only if you don't write down the suggestion from search, right?
Yes, but knowing Coq users, many will be too lazy to do this :-)
Also, even when navigating back and forth in the interactive document the commands might start to fail (unless addition would indeed be idempotent).
Now that I think about it, making addition idempotent would be very difficult.
My fear is that the hard requirements are very incompatible with the way the document can be processed interactively.
So either you don't learn from the current file, or you have to soften them (or turn off the STM).
So you don't think that the original idea of importing/exporting the data into Evd.State.field
's at proof start/end will work? @Emilio Jesús Gallego Arias seemed to think it would.
As long as your accumulation function is associative-commutative you should be fine though.
The STM needs to be modified to handle arbitrary such state rather than just the global monomorphic universes obviously, but that was discussed during the wg already
Yes and no, the universe constraints for the proofs are added to the global state on explicit demand (the wheels in CoqIDE) and until then they are in a lazy (a thunk). In one reads that thunk, then it is blocking.
I don't see how that could just work.
I mean, having Qed take some data from the evar map and add it to a global store is fine, and would work across processes as well, but if the data is immediately needed by the text following the Qed, then it is a synchronization point.
I don't think it's needed, it can be delayed, that's the point of having a partial commutative monoid
That means more non-determinism if you're using the tactician solver, but it's already the case due to universes
If I got it right, it is not commutative and cannot be. Example:
Lemma training : statement
Proof.
tactic
Qed.
Lemma work : statement
Proof.
search (with no hint)
Qed.
How can the call to search work if tactic
did not run?
Having a fully linear document would not be the end of the world. Are you saying that it would work, but just not be async at all?
@Enrico Tassi it's already the case, c'mon
The result of the evaluation of a document depends on the scheduling order
I know, so why are we suggesting it is OK?
Well, In my mind Lemma work : statement
would be a query, and therefore be a sync point right?
@Lasse the problem is the Qed, not the Lemma
Everything inside Proof / Qed can be delayed arbitrarily
Yes, but if Qed is classified as side-effecting, and Lemma (after Qed) is classified as a query, they will be synced right?
"It's complicated."
Haha, okay
I thought that we were embracing non-determinism when discussing the implementation details at the CUDW
Is the "complicated" part about universe unification? Because in that case I don't care. That is extremely unlikely to affect me
Yes, let's clarify the hard requirements and the bonus ones
Maybe I should propose a silly model and see how hard it breaks.
I'd say that the hard requirements are (almost entirely) deterministic compilation, with a fully linear document (completely disabling async if needed). Any amount of async we could achieve would be a bonus.
In that case you can forget about async indeed.
(As far as I know async only works in CoqIDE right? So I suspect that very few people are actually using it.)
Vscoq as well
ah okay
but we hope to move soon to another backend, without the STM, but still async
this is why I want to understand the requirements
I might have missed the answer, but is it important that the current document is taken into account (rather than just the imported libraries)?
I was about to ask the same question. For demos, I guess so, but in real life?
This conversation starts looking way too much like the native compilation scheme issue
Yes, it turns out that the most effective learning happens withing a document (because there the most relevant/related proofs are)
Is tactician monotonic in its learning state?
i.e. if it succeeds on a given state, then it succeeds on a bigger one
(disregarding the generated proof term)
With the exception of the leafs of proofs. Those sometimes move into an entirely different domain, such as set containment. In that case external libs are more important.
@Pierre-Marie Pédrot Depends on the learning agent. With the current released one yes, but only in theory. In practice, the search time might explode into infinity.
Regarding requirements: I think that determinism is much more important than the asyncability of Tactician. So if we can just turn async off that would be a good first approach.
But then even adding a lemma in the middle of a (already compiled) file can break its compilation
Indeed, but that is also true for auto
.
I'm fine turning off async (or not learning from the current file if async is on).
The difference is that auto doesn't use lemmas without an explicit command.
Is vscoq even able to run with async turned off?
Yes, but it does listen do hint databases. So you might also complain that auto
failed just because you added some hint.
I am not sure I follow the argument.
(barring universes) auto doesn't depend on anything inside a proof / qed block, which is an invariant necessary for async processing
(yes, the STM works with async off...)
That the STM works that way doesn't mean that it'll properly behave UX-wise
The analogy:
I don't see much difference.
you can avoid adding a hint
anyway, it's just a default
With tactician you can also avoid learning from a lemma:-)
you could imagine a #[learn=off]
ok then
There is already a tactician ignore tac
tacticical, and a Unset Tactician Learning
option.
If we settle on "learning only happens in sequential mode" then I would just have a global summary which is not rolled back by the STM at Qed time. I don't see the point of threading things in the evar map.
(barring universes) auto doesn't depend on anything inside a proof / qed block, which is an invariant necessary for async processing
I think we are misunderstanding each other. My argument was about wether or not it is problematic that Tactician can fail after inserting a lemma. I was not making any claim about async.
@Enrico Tassi That would be fine with me.
Not having async would be a fine practical compromise to me. But it is not yet clear to me why it is theoretically impossible to support async. It seems to me that all one has to do is create sync points between Qed's (where new state is added), and search
tactics after those Qed's, where that state is actually used. That would indeed assume commutativity, and perhaps the universes would be slightly different. Commutativity might be a problem depending on the agent. But I doubt that the universes matter.
I'm not saying that is easy, but I don't get why it is impossible?
@Enrico Tassi I'm thinking about how to achieve a summary that is not rolled back at Qed time. And I now also remember that the data in my summaries is also added to Libobject
, I hope that doesn't invalidate the discussion above. Would the easiest way be to just add a flag indicating that the data should not be rolled back? I suppose the code would need to check that async is off when accepting such a flag...
It's fine even if it is in libobject I guess.
The whole async thing is under the assumption that text between Proof
and Qed
does not matter. You learn from it, so it does. End of the game. If you accept not to learn from the current file (if the file is processed asynchronously), then it's fine. IMO, this is much better than forcing sequential execution on users of large libraries and it also encourages users to split their development into files which makes sense (and which are learnt from).
Not learning from the current file will be really easy. This is just a question of turning off Tacticians proof mode, and then async will be fully functional again, while the search
tactic is still functioning, and can still learn from Require
d files. It is totally fine to me if users want to use it that way, and I guess I can build in some support for that into the tactician
utility.
However, making that the preferred mode of operation is not feasible, because Tacticians "proof rate" would take a significant dive. The proof rate is everything for these ML tools because otherwise, you might as well be using auto
. So I'm afraid all other concerns are kind of secondary to that. And indeed, it will help if you structure your development with many small files. But for the proof rate that I'm currently measuring this will not matter, because Tacticians benchmarks are run on existing developments that are not optimized in this way.
The whole async thing is under the assumption that text between Proof and Qed does not matter. You learn from it, so it does. End of the game.
Okay, I guess that makes sense. I'll give up on that then :-)
Pierre-Marie Pédrot said:
That the STM works that way doesn't mean that it'll properly behave UX-wise
You can force synchronous processing in VSCoq and it works well, so I guess that's not an issue
Last updated: Oct 13 2024 at 01:02 UTC