Stream: coq-lsp

Topic: Interrupting Coq


view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 16:59):

A core need coq-lsp has is to interrupt Coq and it likes to run for a long time often.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:03):

This is already very tricky to do well in general, but moreover, we are in a kind of worst-case scenario here:

So we will go for a regular Coq checking process that can be signalled, but in general that'll have bad latency and severe limitations. Things not looking good for us :(

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:07):

Tbh, I think you're making assumptions on what's going to work and what's not going to work too early in the prototyping process.

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:08):

Why do processes require serialization? The hack folks built a multi process typecheckers with shared memory just fine. They even created a library to make this easy

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:10):

Why is instrumenting coq error prone? Instrumenting something like your bytecode vm should be fairly easy. Checking if a cancellation flag has been set is quite cheap in a tight loop is quite cheap with branch prediction.

view this post on Zulip Enrico Tassi (Jul 07 2022 at 17:11):

Coq is already instrumented, there is a check_for_interrupt api called here and there.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:16):

Rudi Grinberg said:

Tbh, I think you're making assumptions on what's going to work and what's not going to work too early in the prototyping process.

What are these assumptions?

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:16):

Multi processes are slow, interruption is error prone and expensive

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:17):

Enrico Tassi said:

Coq is already instrumented, there is a check_for_interrupt api called here and there.

And indeed there are 3 problems with it: a) as last time I measured 2/3% overhead b) it doesn't always work (easy to hit in real life, auto for example skips the check often) c) plugins have to implement it too, not so easy for some

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:21):

Rudi Grinberg said:

Multi processes are slow, interruption is error prone and expensive

1) is an assumption based on a lot of experiments, I've measured that quite a few times, and there is general consensus in our field (c.f. Isabelle or Lean) that for trying to check individual proofs, the process model just don't work; moreover you can just try the current implementation in Coq, which uses worker processes to check proofs. It is much slower. You can try optimizations, but for the kind of proof search I'm interested in, they don't work. Go and try to implement for example Z3 using processes instead of threads. I'd bet a nice dinner you won't be very happy.

2) I like interruption, my complain is not about interruption being error prone (which actually is), but about OCaml not implementing Thread.cancel not even in 5.0

view this post on Zulip Paolo Giarrusso (Jul 07 2022 at 17:23):

@Emilio Jesús Gallego Arias Rudi seems to ask, why can't you just have a multithreaded typechecker using shared memory? I also imagine it'd crash and burn but I haven't tried

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:23):

Yes, Isabelle and Lean did it one way, but I gave you a counter example that uses multi processes that works well. See hack.

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:23):

Paolo Giarrusso said:

Emilio Jesús Gallego Arias Rudi seems to ask, why can't you just have a multithreaded typechecker using shared memory? I also imagine it'd crash and burn but I haven't tried

Seems to work well for pyre, flow, hack

view this post on Zulip Paolo Giarrusso (Jul 07 2022 at 17:24):

https://gkossakowski.medium.com/can-scala-have-a-highly-parallel-typechecker-95cd7c146d20#.ugiezb3pq analyzes what it'd take to use the Hack model and apply it to Scala

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:24):

Paolo Giarrusso said:

Emilio Jesús Gallego Arias Rudi seems to ask, why can't you just have a multithreaded typechecker using shared memory? I also imagine it'd crash and burn but I haven't tried

I can't because I cannot cancel the threads when a proof that is being checked has been updated by an edit.

view this post on Zulip Ali Caglayan (Jul 07 2022 at 17:25):

We have some jobs that take too long, and we would like to reschedule work or cancel. Multithreading here is not the most important thing IMO. The ability to do other things without cancelling is what we are after.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:25):

Or for example, more nicer things that Isabelle does is cancel checking of everything not in view

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:25):

I think Paolo made a typo. I suggested a multi process type checker with shared memory :)

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:25):

Rudi Grinberg said:

Yes, Isabelle and Lean did it one way, but I gave you a counter example that uses multi processes that works well. See hack.

You mean flow?

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:26):

Emilio Jesús Gallego Arias said:

Rudi Grinberg said:

Yes, Isabelle and Lean did it one way, but I gave you a counter example that uses multi processes that works well. See hack.

You mean flow?

they all use the same architecture under the hood. Flow, hack, pyre

view this post on Zulip Paolo Giarrusso (Jul 07 2022 at 17:26):

maybe a thinko, but yes — @Rudi Grinberg your point is that you can cancel processes?

view this post on Zulip Ali Caglayan (Jul 07 2022 at 17:28):

Ali Caglayan said:

We have some jobs that take too long, and we would like to reschedule work or cancel. Multithreading here is not the most important thing IMO. The ability to do other things without cancelling is what we are after.

The reason we might use processors or threads is because this is exactly the kind of work the operating system knows how to do and is good at. But I don't think it is essential to have a multithreaded application. It might be possible to find a way to reschedule work.

view this post on Zulip Paolo Giarrusso (Jul 07 2022 at 17:28):

how do you deal with the shared _mutable_ state? if you use locks, how do you cancel processes that have locks? AFAICT Hack's model only needs shared _immutable_ state?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:29):

I don't know anything about hack/flow, how do they distribute work?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:30):

I mean, when they spawn a process, what the work unit for them?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:31):

What's the library hack / flow people use?

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:31):

I don't know if they ever extracted it into a separate lib (they seem to just copy paste it between all their projects). Somebody did though: https://github.com/rvantonder/hack_parallel

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:32):

https://github.com/facebook/flow/blob/main/src/heap/sharedMem.mli

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:33):

Yes I found it also in the flow repos

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:36):

It seems that they use shared memory, but require serialization of objects still?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:36):

Looks like using that would require a rewrite of Coq, and still incur in heavy overhead :( unless I'm reading the API wrong

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:36):

but nice work!

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:41):

They don't need serialization, but they do have a shared heap the worker processes access with mmap.

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:43):

Emilio Jesús Gallego Arias said:

Looks like using that would require a rewrite of Coq, and still incur in heavy overhead :( unless I'm reading the API wrong

To make full use of it yes. But perhaps you can just take the shared heap part of it for the information you want persisted between sessions

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:46):

Basically, their Memory module is what you need.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:51):

They don't need serialization, but they do have a shared heap the worker processes access with mmap.

I was reading the code in flow today, but in the library you point out, they do indeed seem to serialize, see hh_store_ocaml, as indeed otherwise how can you transfer OCaml objects to other process? Tho the whole idea is very interesting

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:52):

So indeed I agree that it'd be cool to experiment with that, but it is not serialization free so at that point you start to have a very delicate balance.

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:52):

Well, it depends what you mean by serializing. Do you mean for IPC? then no. AFAIK you only need to copy the ocaml objects to the mmap'd memory

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:53):

In the code of flow today it seems they prefer to go away from the generic serialization implemented in that repos

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:53):

so that's just a big blit (which is much much cheaper)

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:53):

Note closures for example are not supported, but indeed it'd be interesting to experiment with that

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:53):

Emilio Jesús Gallego Arias said:

In the code of flow today it seems they prefer to go away from the generic serialization implemented in that repos

yes, marshal is problematic

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:54):

Paolo Giarrusso said:

how do you deal with the shared _mutable_ state? if you use locks, how do you cancel processes that have locks? AFAICT Hack's model only needs shared _immutable_ state?

AFAIK they have a shared hash table that they can update transactionally. no locks needed

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:54):

Rudi Grinberg said:

so that's just a big blit (which is much much cheaper)

Aren't they calling caml_output_value_to_malloc ? That's marshall

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:55):

If that's the case, that lib is what we have today in Coq, I'd dare to say I understand well the perf problems

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:55):

Maybe I don't understand what you mean by "big blit"

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:55):

how do you move the marshalled data currently?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:56):

I guess via a socket

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:56):

Well that would be the source of your slowness then.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:56):

but that's not the bad profile

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:56):

socket is happy, profile show marshall being the culprit

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:57):

We are in the wrong thread tho

view this post on Zulip Paolo Giarrusso (Jul 07 2022 at 17:57):

this marshalling still preserves sharing, right?

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:57):

Those must be really huge terms then.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:57):

to discuss about .vo files and marhsall we can use the Caching one

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:57):

maybe I should define what I mean by "shared memory"

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:57):

By shared memory I mean when I can pass a reference to an OCaml value at zero cost

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:57):

that shared memory lib is calling marshal, right?

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 17:57):

yeah

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:58):

so that doesn't meet my def, but I understand it is an acceptable on etoo

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 17:58):

So here we have to things IMHO:

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:00):

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:00):

So while I'm sure we can develop improvements on both fronts, and by a long margin, there is a standard model for the kind of system Coq is which traditional, shared-memory threading.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:00):

And beating that one is not easy.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:01):

Rudi Grinberg said:

Those must be really huge terms then.

The context is huge, think of something like an advanced math proof, you may have 200k definitions in scope

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:01):

At least for now we can just have a single process, and wait to see what we can do with domains

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:02):

I think we can agree that beating multi threads with shared memory is impossible. However, the goal is really to be "good enough"

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:02):

or indeed, copy the STM / unnamed document server model with processes, but I'm not sure why it will be better than today

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:02):

Btw, I'm not sure how lean/isabelle handle cancellation on windows.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:02):

Yes I agree, we want to be good enough, however this domain is tricky

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:02):

No idea how they do on windows, you can look at poly source code

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:02):

pthread_cancel doesn't exist on windows so they must have a hack of some sorts

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:03):

TerminateThread has different semantics but works well I think

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:04):

TerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:04):

The above is from MSDN. Doesn't seem very inviting

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:05):

Yes, interruption in the way a proof search system such as an ITP or SMT needs is very dangerous

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:06):

the problem is that we can't predict what will run for long in a system like Coq

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:06):

in other systems you have a good idea of what works

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:16):

PolyML is very interesting, but I hope we don't have to go that way.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:16):

They seem to set the stack limit of the thread to the stack top, so the thread will get interrupted the moment they try to grow the stack

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:16):

that would work for Coq hehe

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:16):

but indeed, the refuse to use pthread_cancel and indeed prefer a controlled shutdown

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:17):

they told me it took them a long time to make that work reliably, so I'm scared

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:20):

do they have plugins?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 07 2022 at 18:22):

No idea

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:22):

Emilio Jesús Gallego Arias said:

They seem to set the stack limit of the thread to the stack top, so the thread will get interrupted the moment they try to grow the stack

that will not be able to interrupt everything

view this post on Zulip Rudi Grinberg (Jul 07 2022 at 18:24):

In OCaml, it could be nice if you could check a cancellation flag on allocation. But that would require a custom runtime...

view this post on Zulip Enrico Tassi (Jul 07 2022 at 20:17):

I dislike check_for_interrupt too, I was just pointing out it is already in place

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 16:16):

Moved the discussion upstream https://github.com/ocaml/ocaml/issues/11411 , warning, the issue as is now is stale and we will rewrite it.

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 16:30):

I think you can basically move along and rely on the check_for_interrupt mechanism for now. Eventually, you can just re-implement it with the domain based approach

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 16:31):

In fact, you'll need the check_for_interrupt stuff for 4.x support anyway

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 16:35):

Rudi Grinberg said:

In fact, you'll need the check_for_interrupt stuff for 4.x support anyway

I'd be great if we could use it, but it doesn't work, lots of more checks need to be added if you want all tactics to be interruptible.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 16:35):

We will use current async support for now, that is to say, sending a signal.

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 16:36):

And a bit of care. GMM send to us long time ago a nice guide how to handle that well.

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 17:32):

So no windows?

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:13):

I don't know, I guess not until the OCaml runtime supports the signal-less operation

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:13):

I mean we can set check_for_interrupt, but that works very bad

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:14):

if windows user care they could have their fork of Coq and use it in the build

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:14):

adding more check_for_interrupt, but overhead can be felt

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:15):

as Coq tactics that tend to get stuck have a thigh inner loop

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:16):

I'm out of ideas; in win we could kill the process but making the cache work well will require lots of effort

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 18:26):

You could rely on signals on unix and check_for_interrupt on windows with conditional compilation. That will introduce overhead only on windows

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 18:26):

Btw, what's this enable_thread_delay thing? It seems like it's slowing down check_for_interrupt

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 18:28):

btw, is check_for_interrupt being inlined? that could make a difference too

view this post on Zulip Rudi Grinberg (Jul 08 2022 at 18:31):

Emilio Jesús Gallego Arias said:

as Coq tactics that tend to get stuck have a thigh inner loop

if the tight loop doesn't allocate, I don't think OCaml will deliver the signal to thread btw

view this post on Zulip Emilio Jesús Gallego Arias (Jul 08 2022 at 18:54):

Rudi Grinberg said:

if the tight loop doesn't allocate, I don't think OCaml will deliver the signal to thread btw

Yes, that's the core of the problem, tho I think allocations usually happen. But in some very rare cases we may get stuck.

I guess we'll have to do the best we can.


Last updated: Feb 06 2023 at 05:03 UTC