Stream: Miscellaneous

Topic: Ph.D. on formalizing complexity classes


view this post on Zulip Martin Dvořák (Aug 07 2021 at 06:34):

Hello everyone!

As my Ph.D. project (dissertation), I want to formalize complexity classes and some relationships between them (I would like to reprove stuff such as PSPACE = NPSPACE and I would like to formalize some chapters of fixed-language (V)CSP complexity, too). Do you think it is better to use Lean or Coq for that?

[crossposting to Zulip chat for Lean]

view this post on Zulip Pierre-Marie Pédrot (Aug 07 2021 at 12:08):

There has been some recent work on Turing machines in Coq, e.g. https://github.com/uds-psl/CoqTM, which you may base your work on.

view this post on Zulip Pierre-Marie Pédrot (Aug 07 2021 at 12:09):

@Yannick Forster is probably the expert here.

view this post on Zulip Pierre-Marie Pédrot (Aug 07 2021 at 12:10):

if you're planning to reimplement everything from scrath, I don't think that there will be enough differences between Lean and Coq to make the difference observable.

view this post on Zulip Pierre-Marie Pédrot (Aug 07 2021 at 12:11):

So it'd be a matter of taste and community.

view this post on Zulip Yannick Forster (Aug 07 2021 at 12:12):

@fakusb is the real expert, but thanks for the ping Pierre-Marie

view this post on Zulip Yannick Forster (Aug 07 2021 at 12:13):

https://drops.dagstuhl.de/opus/volltexte/2021/13915/

view this post on Zulip Yannick Forster (Aug 07 2021 at 12:13):

This is the only paper I know formalizing complexity classes

view this post on Zulip Yannick Forster (Aug 07 2021 at 12:13):

And they use the only feasible way to do it: By extracting coq functions automatically to a model of Computation, because manually manipulating programs is always way too time consuming

view this post on Zulip Yannick Forster (Aug 07 2021 at 12:14):

I agree that it doesn't matter whether you use Lean or Coq in principle, but given all the previous work in Coq, Coq is likely the better choice

view this post on Zulip Théo Zimmermann (Aug 07 2021 at 12:24):

You probably meant to cc @Fabian Kunze

view this post on Zulip Fabian Kunze (Aug 07 2021 at 13:02):

We mechanised P&NP and the NP-completeness of SAT, as Yannick linked.
I have not tried Lean myself, what I appreciated with Coq was the ability to define my own tactics/frameworks to ease with the program verification.

view this post on Zulip Fabian Kunze (Aug 07 2021 at 13:05):

Concerning space complexity: Which model of computation to you intend to define space for? A good first step is to look on analysing a concrete program, as this is the main difficulty that is often glossed over on paper proofs, but blowing up significantly in a mechanisation.

view this post on Zulip Fabian Kunze (Aug 07 2021 at 13:07):

I would agree with Yannick that without the "automatic extraction" framework, doing program verification by hand is somewhere on the scale between tedious to not feasible for even small-to-medium sized standard textbook constructions.
framework: https://drops.dagstuhl.de/opus/volltexte/2019/11072/pdf/LIPIcs-ITP-2019-17.pdf

view this post on Zulip Yannick Forster (Aug 07 2021 at 14:07):

One more addendum: The work by Fabian and Lennard is using the call by value lambda calculus as model of computation, not Turing machines, which is possible due to the following result: http://doi.org/10.1145/3371095

view this post on Zulip Yannick Forster (Aug 07 2021 at 14:08):

The culprit is that sub-polynomial space complexity is not faithful in this setting, but everything else is

view this post on Zulip Yannick Forster (Aug 07 2021 at 14:10):

And to spare you some reading: the relevant sentence of the paper linked by Pierre-Marie is that Turing machines are infeasible to formalise or mechanise any involved computability or complexity-theoretic result

view this post on Zulip Pierre-Marie Pédrot (Aug 07 2021 at 14:12):

Somewhat on topic, out of a completely irrational thought process, I believe that the best model of computation is no model at all, and I've been daydreaming about formalizing descriptive complexity for a while but never took enough time and energy to actually start doing it.

view this post on Zulip Pierre-Marie Pédrot (Aug 07 2021 at 14:13):

the good thing with descriptive complexity is that you only have to pay once the price to show that the semantics is adequate w.r.t. your favourite pair computation model / complexity class

view this post on Zulip Martin Dvořák (Aug 07 2021 at 17:14):

Fabian Kunze said:

We mechanised P&NP and the NP-completeness of SAT, as Yannick linked.

Wow, it is great that you personally replied! I see that you did a whole lot of formalization in this area! I have a huge respect for you!

Before I elaborate on my research intentions and my naïve ideas about what I think I will be able to formalize, let me ask a topical question:

Is your research currently continuing? What other stuff are you going to do with your library?

view this post on Zulip Fabian Kunze (Aug 07 2021 at 18:26):

I'm flattered, thanks :)

I'm currently writing up the mechanisation of the time hierarchy theorem. I then plan on adding space complexity analysis to the extraction framework and I plan to do a simple result involving space, for example the space hierarchy theorem.

view this post on Zulip Martin Dvořák (Aug 07 2021 at 18:35):

Fabian Kunze said:

I'm flattered, thanks :)

I'm currently writing up the mechanisation of the time hierarchy theorem. I then plan on adding space complexity analysis to the extraction framework and I plan to do a simple result involving space, for example the space hierarchy theorem.

Do you think I might be able to contribute to that? Let's say, if I started studying Coq from the next week on, full time, could I be useful for your project earlier than you get close to finishing it? I don't have an estimate how hard these things are to get into.

Or would you rather see me developing my own library, independent of yours?

view this post on Zulip Martin Dvořák (Aug 07 2021 at 18:36):

I should have said in the beginning: I have no experience with Coq. I only know a little bit of Lean.

view this post on Zulip Martin Dvořák (Aug 07 2021 at 18:42):

Pierre-Marie Pédrot said:

if you're planning to reimplement everything from scrath, I don't think that there will be enough differences between Lean and Coq to make the difference observable.

My dream is to formally verify all of these inclusions:
http://pub.ist.ac.at/~mdvorak/stuff/Computational_classes.pdf

However, this is not a project for a Ph.D.. In fact, one lifetime might not be enough for that.

I was a bit astonished by how difficult it was to formalize P, NP, NP-hard and NP-complete in the Kunze's and Forster's paper. Nevertheless, I might avoid the most problematic parts because my dream does not include -hard and -complete problems (which are possible the hardest thing to prove among what is currently known).

view this post on Zulip Fabian Kunze (Aug 08 2021 at 11:52):

The things I listed as my work-in-progress in our setting/framework are most likely faster done by myself than by training someone external, and I also plan on finishing writing up my dissertation somewhat soon.

I also see some interesting formalisation (even on paper) questions arising before the inclusions you mentioned can be tackled, at least if one uses the call-by-value lambda calculus, as we do: (And if one uses a different model, one would have to investigate on how to make verify program in a scalable way.)

Note that for fine-grained complexity theory (e.g. classes that are not closed under poly-time reductions), the "naive" definition with call-by-value lambda calculus might differ from the textbook-definition with TMs/RAM machines, which also needs some investigation.

view this post on Zulip Paolo Giarrusso (Aug 08 2021 at 13:09):

Naive question: Looking at the linked thesis (against TMs), would URMs or mu-recursive functions be easier models to handle, without the torny questions that arise with lambda-calculus? At the very least, control-flow operators become expressible directly.

view this post on Zulip Paolo Giarrusso (Aug 08 2021 at 13:12):

OTOH, I imagine having unbounded integers in your machine model complicates defining space consumption, and while logarithms suggest a possible answer I haven't seen a proper treatment.

view this post on Zulip Fabian Kunze (Aug 09 2021 at 09:11):

Paolo Giarrusso said:

Naive question: Looking at the linked thesis (against TMs), would URMs or mu-recursive functions be easier models to handle, without the torny questions that arise with lambda-calculus? At the very least, control-flow operators become expressible directly.

URMs would be easier for space on the theoretical side, as the definition of space for them is well known, although the "all pointers must be in (smaller than) the order of magnitude to address the input" restriction already gives me a headache even thinking about (both on the "handlable definition" and "composable programs" side ).
On the "mechanisation" side, lambda calculus is such a sweet spot for program verification in coq, and a compositional verification framework for ram machines with resources seems an interesting engineering challenge. I also quite like that encodings of all "constructor type" style data types is natively possible with lambda calculus.

I'm not aware on any time- or space complexity definition for mu-recursive functions, and I think that lambda calculus + native integers would be strictly "better" suited than mu-rec funs.

view this post on Zulip Yannick Forster (Aug 09 2021 at 14:50):

Here are my 2 cents on Paolo's question: The model of computation doesn't really matter. It has to be a reasonable one (polynomial time, constant factor space overhead wrt Turing machines) and it has to be easy to turn total Coq functions into programs in the model. The first narrows down the choice, but the second is more relevant in any case. The question boils down to "is it easier to extract Coq code to lambda calculus or to URMs?". Our answer is that extraction to lambda calculus is a lot easier, so we went down that path. But if somebody comes up with a good extraction framework from Coq to URMs or even to Turing machines, from the perspective of formalisations of complexity theory this will work comparably well - but we estimate that such a framework will be considerably more complicated than the already not so trivial extraction to lambda calculus

view this post on Zulip Martin Dvořák (Aug 09 2021 at 16:38):

Why do we require constant-factor space overhead wrt Turing machines?

Is it because we want to have fine-grained space complexity classes (since, for example, NSPACE(n) is equivalent to Context-sensitive languages but NSPACE(n*log(n)) isn't) or is the reason more fundamental (something that would in turn destroy reasoning about P, PSPACE, EXPTIME, etc as well)?

view this post on Zulip Fabian Kunze (Aug 11 2021 at 11:17):

I'm not sure about more fundamental reasons/consequences.

For me, it is just that space seems to be very canonical: In the end, it should count the "information storage" needed when carrying out the computation, e.g. on paper.

view this post on Zulip Fabian Kunze (Aug 11 2021 at 11:24):

The "extracton" approach to a lambda calculus also has the benefit that there is an easy intuition on how the extracted lambda term looks like, namely that nit is the coq term with call-by-value semantics, which means the complexity can be easily guesses. When extracting to other target languages, this intuitive understanding of the computational structure of the extraction result might be harder to achieve.

Although, I can imagine easily understandable translations from certain sub-fragments, for example when only tail-recursive recursion is allowed, to many other models of computation.

view this post on Zulip Yannick Forster (Aug 11 2021 at 18:25):

The main reason linear space overhead is interesting is the general interest in the class LOGSPACE I think. I would recommend reading the seminal paper by Slot and van Emde Boas introducing the notion of reasonable models and the invariance thesis. It's very accessible and explains well the subtleties wrt RAM machines

view this post on Zulip Martin Dvořák (Aug 12 2021 at 21:06):

Is it worth reading the complicated parts of the paper (the hashing stuff)? Or is reasonable to jump from "Introduction" right into "Simulation of RAM on a Turingmachine" when reading it?

view this post on Zulip Yannick Forster (Aug 13 2021 at 16:39):

The high-level stuff is enough I think :)

view this post on Zulip Martin Dvořák (Aug 13 2021 at 17:01):

Thanks!

view this post on Zulip Martin Dvořák (Aug 18 2021 at 07:23):

I was reading and thinking about the formalization of complexity classes. I've come to the conclusion that aiming directly for formalizing complexity classes in a computational model (no matter which I would choose) would not be a good Ph.D. idea. First of all, it seems to be very very very complicated! Second, Fabian Kunze and Lennard Gäher seem to be going to publish more about complexity classes in Coq quite soon, and so, there would be a risk that my work would end up being just a subset of what these two great guys will have done.

I would like to base my research proposal around the following notes. Do you have any comments on that, please? Did I miss an important paper from this area that already formalized something relevant that I am not aware of having been done?

[crossposting to Zulip chat for Lean]

Already in Lean/mathlib:

Already in Coq:

I would like to add to Lean/mathlib:

Plan B:

view this post on Zulip Martin Dvořák (Aug 20 2021 at 07:23):

Yannick Forster said:

Here are my 2 cents on Paolo's question: The model of computation doesn't really matter. It has to be a reasonable one (polynomial time, constant factor space overhead wrt Turing machines) and (...)

How large is the overhead of call-by-value 𝜆-calculus wrt Turing Machines? I read that it is polynomial; however, can you provide some concrete bounds?
@Fabian Kunze

view this post on Zulip Fabian Kunze (Aug 20 2021 at 11:26):

We give the exact run times/polynomials for the simulation in the coq code of our paper (the paper itself only mentions that it is a polynomial (simulating lambda-calculus in TMs, or a constant factor (simulating TMs in lambda-calculus))):
https://github.com/uds-psl/time-invariance-thesis-for-L/blob/main/theories/summary.v
in the 4th theorem, the polynomial is only explicitly constructed in the proof.


Last updated: Aug 14 2022 at 13:02 UTC