A core need coq-lsp has is to interrupt Coq and it likes to run for a long time often.
This is already very tricky to do well in general, but moreover, we are in a kind of worst-case scenario here:
pthread_cancel
that would allow us to defer proof-checking to a thread, and cancel it when we don't care about it anymore (Isabelle and Lean do this)So we will go for a regular Coq checking process that can be signalled, but in general that'll have bad latency and severe limitations. Things not looking good for us :(
Tbh, I think you're making assumptions on what's going to work and what's not going to work too early in the prototyping process.
Why do processes require serialization? The hack folks built a multi process typecheckers with shared memory just fine. They even created a library to make this easy
Why is instrumenting coq error prone? Instrumenting something like your bytecode vm should be fairly easy. Checking if a cancellation flag has been set is quite cheap in a tight loop is quite cheap with branch prediction.
Coq is already instrumented, there is a check_for_interrupt api called here and there.
Rudi Grinberg said:
Tbh, I think you're making assumptions on what's going to work and what's not going to work too early in the prototyping process.
What are these assumptions?
Multi processes are slow, interruption is error prone and expensive
Enrico Tassi said:
Coq is already instrumented, there is a check_for_interrupt api called here and there.
And indeed there are 3 problems with it: a) as last time I measured 2/3% overhead b) it doesn't always work (easy to hit in real life, auto for example skips the check often) c) plugins have to implement it too, not so easy for some
Rudi Grinberg said:
Multi processes are slow, interruption is error prone and expensive
1) is an assumption based on a lot of experiments, I've measured that quite a few times, and there is general consensus in our field (c.f. Isabelle or Lean) that for trying to check individual proofs, the process model just don't work; moreover you can just try the current implementation in Coq, which uses worker processes to check proofs. It is much slower. You can try optimizations, but for the kind of proof search I'm interested in, they don't work. Go and try to implement for example Z3 using processes instead of threads. I'd bet a nice dinner you won't be very happy.
2) I like interruption, my complain is not about interruption being error prone (which actually is), but about OCaml not implementing Thread.cancel
not even in 5.0
@Emilio Jesús Gallego Arias Rudi seems to ask, why can't you just have a multithreaded typechecker using shared memory? I also imagine it'd crash and burn but I haven't tried
Yes, Isabelle and Lean did it one way, but I gave you a counter example that uses multi processes that works well. See hack.
Paolo Giarrusso said:
Emilio Jesús Gallego Arias Rudi seems to ask, why can't you just have a multithreaded typechecker using shared memory? I also imagine it'd crash and burn but I haven't tried
Seems to work well for pyre, flow, hack
https://gkossakowski.medium.com/can-scala-have-a-highly-parallel-typechecker-95cd7c146d20#.ugiezb3pq analyzes what it'd take to use the Hack model and apply it to Scala
Paolo Giarrusso said:
Emilio Jesús Gallego Arias Rudi seems to ask, why can't you just have a multithreaded typechecker using shared memory? I also imagine it'd crash and burn but I haven't tried
I can't because I cannot cancel the threads when a proof that is being checked has been updated by an edit.
We have some jobs that take too long, and we would like to reschedule work or cancel. Multithreading here is not the most important thing IMO. The ability to do other things without cancelling is what we are after.
Or for example, more nicer things that Isabelle does is cancel checking of everything not in view
I think Paolo made a typo. I suggested a multi process type checker with shared memory :)
Rudi Grinberg said:
Yes, Isabelle and Lean did it one way, but I gave you a counter example that uses multi processes that works well. See hack.
You mean flow?
Emilio Jesús Gallego Arias said:
Rudi Grinberg said:
Yes, Isabelle and Lean did it one way, but I gave you a counter example that uses multi processes that works well. See hack.
You mean flow?
they all use the same architecture under the hood. Flow, hack, pyre
maybe a thinko, but yes — @Rudi Grinberg your point is that you can cancel processes?
Ali Caglayan said:
We have some jobs that take too long, and we would like to reschedule work or cancel. Multithreading here is not the most important thing IMO. The ability to do other things without cancelling is what we are after.
The reason we might use processors or threads is because this is exactly the kind of work the operating system knows how to do and is good at. But I don't think it is essential to have a multithreaded application. It might be possible to find a way to reschedule work.
how do you deal with the shared _mutable_ state? if you use locks, how do you cancel processes that have locks? AFAICT Hack's model only needs shared _immutable_ state?
I don't know anything about hack/flow, how do they distribute work?
I mean, when they spawn a process, what the work unit for them?
What's the library hack / flow people use?
I don't know if they ever extracted it into a separate lib (they seem to just copy paste it between all their projects). Somebody did though: https://github.com/rvantonder/hack_parallel
https://github.com/facebook/flow/blob/main/src/heap/sharedMem.mli
Yes I found it also in the flow repos
It seems that they use shared memory, but require serialization of objects still?
Looks like using that would require a rewrite of Coq, and still incur in heavy overhead :( unless I'm reading the API wrong
but nice work!
They don't need serialization, but they do have a shared heap the worker processes access with mmap.
Emilio Jesús Gallego Arias said:
Looks like using that would require a rewrite of Coq, and still incur in heavy overhead :( unless I'm reading the API wrong
To make full use of it yes. But perhaps you can just take the shared heap part of it for the information you want persisted between sessions
Basically, their Memory
module is what you need.
They don't need serialization, but they do have a shared heap the worker processes access with mmap.
I was reading the code in flow today, but in the library you point out, they do indeed seem to serialize, see hh_store_ocaml
, as indeed otherwise how can you transfer OCaml objects to other process? Tho the whole idea is very interesting
So indeed I agree that it'd be cool to experiment with that, but it is not serialization free so at that point you start to have a very delicate balance.
Well, it depends what you mean by serializing. Do you mean for IPC? then no. AFAIK you only need to copy the ocaml objects to the mmap'd memory
In the code of flow today it seems they prefer to go away from the generic serialization implemented in that repos
so that's just a big blit (which is much much cheaper)
Note closures for example are not supported, but indeed it'd be interesting to experiment with that
Emilio Jesús Gallego Arias said:
In the code of flow today it seems they prefer to go away from the generic serialization implemented in that repos
yes, marshal is problematic
Paolo Giarrusso said:
how do you deal with the shared _mutable_ state? if you use locks, how do you cancel processes that have locks? AFAICT Hack's model only needs shared _immutable_ state?
AFAIK they have a shared hash table that they can update transactionally. no locks needed
Rudi Grinberg said:
so that's just a big blit (which is much much cheaper)
Aren't they calling caml_output_value_to_malloc
? That's marshall
If that's the case, that lib is what we have today in Coq, I'd dare to say I understand well the perf problems
Maybe I don't understand what you mean by "big blit"
how do you move the marshalled data currently?
I guess via a socket
Well that would be the source of your slowness then.
but that's not the bad profile
socket is happy, profile show marshall being the culprit
We are in the wrong thread tho
this marshalling still preserves sharing, right?
Those must be really huge terms then.
to discuss about .vo files and marhsall we can use the Caching one
maybe I should define what I mean by "shared memory"
By shared memory I mean when I can pass a reference to an OCaml value at zero cost
that shared memory lib is calling marshal, right?
yeah
so that doesn't meet my def, but I understand it is an acceptable on etoo
So here we have to things IMHO:
So while I'm sure we can develop improvements on both fronts, and by a long margin, there is a standard model for the kind of system Coq is which traditional, shared-memory threading.
And beating that one is not easy.
Rudi Grinberg said:
Those must be really huge terms then.
The context is huge, think of something like an advanced math proof, you may have 200k definitions in scope
At least for now we can just have a single process, and wait to see what we can do with domains
I think we can agree that beating multi threads with shared memory is impossible. However, the goal is really to be "good enough"
or indeed, copy the STM / unnamed document server model with processes, but I'm not sure why it will be better than today
Btw, I'm not sure how lean/isabelle handle cancellation on windows.
Yes I agree, we want to be good enough, however this domain is tricky
No idea how they do on windows, you can look at poly source code
pthread_cancel doesn't exist on windows so they must have a hack of some sorts
TerminateThread has different semantics but works well I think
TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
The above is from MSDN. Doesn't seem very inviting
Yes, interruption in the way a proof search system such as an ITP or SMT needs is very dangerous
the problem is that we can't predict what will run for long in a system like Coq
in other systems you have a good idea of what works
PolyML is very interesting, but I hope we don't have to go that way.
They seem to set the stack limit of the thread to the stack top, so the thread will get interrupted the moment they try to grow the stack
that would work for Coq hehe
but indeed, the refuse to use pthread_cancel and indeed prefer a controlled shutdown
they told me it took them a long time to make that work reliably, so I'm scared
do they have plugins?
No idea
Emilio Jesús Gallego Arias said:
They seem to set the stack limit of the thread to the stack top, so the thread will get interrupted the moment they try to grow the stack
that will not be able to interrupt everything
In OCaml, it could be nice if you could check a cancellation flag on allocation. But that would require a custom runtime...
I dislike check_for_interrupt too, I was just pointing out it is already in place
Moved the discussion upstream https://github.com/ocaml/ocaml/issues/11411 , warning, the issue as is now is stale and we will rewrite it.
I think you can basically move along and rely on the check_for_interrupt
mechanism for now. Eventually, you can just re-implement it with the domain based approach
In fact, you'll need the check_for_interrupt
stuff for 4.x support anyway
Rudi Grinberg said:
In fact, you'll need the
check_for_interrupt
stuff for 4.x support anyway
I'd be great if we could use it, but it doesn't work, lots of more checks need to be added if you want all tactics to be interruptible.
We will use current async support for now, that is to say, sending a signal.
And a bit of care. GMM send to us long time ago a nice guide how to handle that well.
So no windows?
I don't know, I guess not until the OCaml runtime supports the signal-less operation
I mean we can set check_for_interrupt
, but that works very bad
if windows user care they could have their fork of Coq and use it in the build
adding more check_for_interrupt, but overhead can be felt
as Coq tactics that tend to get stuck have a thigh inner loop
I'm out of ideas; in win we could kill the process but making the cache work well will require lots of effort
You could rely on signals on unix and check_for_interrupt on windows with conditional compilation. That will introduce overhead only on windows
Btw, what's this enable_thread_delay
thing? It seems like it's slowing down check_for_interrupt
btw, is check_for_interrupt
being inlined? that could make a difference too
Emilio Jesús Gallego Arias said:
as Coq tactics that tend to get stuck have a thigh inner loop
if the tight loop doesn't allocate, I don't think OCaml will deliver the signal to thread btw
Rudi Grinberg said:
if the tight loop doesn't allocate, I don't think OCaml will deliver the signal to thread btw
Yes, that's the core of the problem, tho I think allocations usually happen. But in some very rare cases we may get stuck.
I guess we'll have to do the best we can.
Last updated: Dec 07 2023 at 09:01 UTC