Stream: Miscellaneous

Topic: 4CT and type checkers


view this post on Zulip Kevin Buzzard (Sep 30 2021 at 15:50):

I am writing an article about formal proof verification in mathematics, and the first case study I'm considering is the 2005 formally verified proof by Gonthier of the four colour theorem. I imagine my readers will mostly be mathematicians so will contain people who will not be able to tell the difference between the Appel--Haken proof (which "used a computer") and the Gonthier proof (which also "used a computer"). At some point I want to talk about the concept that a "complicated" program such as Coq can be used to turn a human-written tactic proof into a term in a type theory, and then a "simpler" typechecker such as Coq Coq Correct can be used to independently verify that the term has the specified type, thus giving more evidence that the proof is OK.

A natural place to talk about this sort of situation would be when discussing 4CT, but before I write it I was wondering whether it was actually possible to use an existing Coq typechecker to check this particular 4CT proof, and whether it has actually been done. If there are some practical reasons why this cannot be done then of course it doesn't matter, I can just move the explanation of type checkers to a "general" paragraph -- I am only asking about this particular proof because this first (in the article) concrete example of a formally verified proof felt like a natural place to deal with such questions.

view this post on Zulip Karl Palmskog (Sep 30 2021 at 15:59):

@Kevin Buzzard if I understand you correctly, the closest I can think of that is currently practical/feasible is to use the coqchk standalone proof term checker on the proof terms compiled by coqc from the current incarnation of the 4color proof: https://github.com/math-comp/fourcolor

For example, coqchk does not use the virtual machine for computations (vm_compute), so my guess is it would take a long time to check the fourcolor proof terms.

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:01):

Yes, the fact that the proof involved a gigantic computation was scaring me. Maybe it's best to put a discussion of type checkers elsewhere.

view this post on Zulip Karl Palmskog (Sep 30 2021 at 16:02):

I would be interested in a comment from @Enrico Tassi though on the feasibility of checking the current 4CT in some more assuring way.

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:02):

I will remark that according to Wikipedia it took Appel and Haken over 1000 hours to run their unverified code :-)

view this post on Zulip Enrico Tassi (Sep 30 2021 at 16:02):

Actually, vm_compute was added to Coq to make reflexive proofs work nicely, and the 4CT was the thing back then. But I think the proof did type check even before that. Benjamin Gregoir just left, I can ask him for confirmation on Monday I guess.

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:02):

Thanks -- no hurry.

view this post on Zulip Karl Palmskog (Sep 30 2021 at 16:06):

if I remember a conversation with Matthieu Sozeau correctly, the future way to get verified fast computations inside proofs is via MetaCoq+CertiCoq+CompCert, i.e., one does a verified implementation in MetaCoq of something equivalent to vm_compute and native_compute, which is compiled by CertiCoq+CompCert down to machine code on the machine that does the checking. But I think there is quite a bit of work to do to get there.

view this post on Zulip Enrico Tassi (Sep 30 2021 at 16:10):

I think there are 3 parts in the 4CT: the math part of the proof, the algorithm and its correctness proof. To check the entire proof Coq has to run the algorithm (which was proved correct). IMO all other parts are lightweight in terms of "type checking", they should be easy to check with simpler (less optimized) checkers than the one of Coq. The only part which is a bit long is running the algorithm which generates/explores the 700 cases. Today it takes 30 minutes with vm_compute, without it I would expect a factor of 10 or so, not really a "gigantic computation".

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:12):

In order to un-xy things here, my main point regarding this proof was simply to explain to mathematicians why the situation is better than "instead of relying on one computer program, we rely on another one". So far my points are that (1) the chances that both programs have "the same bug" in the sense that one error gets past both of them is of course much smaller than the chances that either program has a bug; (2) Coq has been around for a long time and has a lot of users, the Appel-Haken code is one bespoke piece of code with few users; and then I wanted to go on to say (3) type checkers but as I say I can move the discussion of type checkers elsewhere.

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:13):

I don't want to overload the poor mathematicians' brains too early :-)

view this post on Zulip Karl Palmskog (Sep 30 2021 at 16:16):

yeah, I think this is a reasonable standard argument - coqc (and coqchk) have the advantage of not having anything to do with the domain of the four coloring argument. For computer scientists, I guess it always looks better if you have someone else set up the hoop you should jump through.

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:29):

was wondering whether it was actually possible to use an existing Coq typechecker to check this particular 4CT proof, and whether it has actually been done.

Not sure I fully understand, but as others have said, the proof is maintained, that is to say, it is fully checked by Coq's master branch. You can also of course use the first code, installing an old Coq version. The proof was checked with Coq, otherwise what it would be the point?

I will remark that according to Wikipedia it took Appel and Haken over 1000 hours to run their unverified code :-)

I guess that the original proof by them had to check significantly more cases than the proof implemented in Coq, which likely relies on further optimizations of the original one.

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:33):

The original proof was running on a computer from 1976 :-) I understand that the proof is verified with Coq, I would like to argue that "even if Coq has a bug in its kernel, one can look at the output term created from the proof script and independently verify using something other than Coq's kernel that it is a term of the correct type, so then who even cares if the kernel has a bug".

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:36):

The other issue was that the original proof had to verify that a collection of graphs of size nearly 2000 had two properties but later developments cut this down to 633.

view this post on Zulip Karl Palmskog (Sep 30 2021 at 16:36):

there is a clear difference between the "regular" checking by coqc (which is done after each proof term has been built), and the "after-the-fact" checking of proof terms by a tool like coqchk (which is not affected by plugins and other confounders). In principle, there could be a third-party checker of serialized proof terms output by a tool like SerAPI, but to my knowledge there are no tools like this currently (MetaCoq / Coq Coq Correct might be the closest)

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:38):

The situation is simply that I would like to say "we relied on program X (Coq kernel) to check 4CT but actually once program X has done its work, we can use program Y to check it, where Y != X, so this gives us more confidence", but I don't really want to say this if in practice Y does not exist. But I know nothing about the inner workings of Coq. Does coqc rely on the correctness of Coq's kernel?

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:39):

The analogue which mathematicians will understand is "we use possibly buggy program X to factor this large number, but once it has finished running and produced factors we can easily check that the product of the factors is the large number without caring about whether X has bugs because we use program Y to multiply them together again"

view this post on Zulip Karl Palmskog (Sep 30 2021 at 16:39):

I am not most qualified person to talk about this - but coqc invokes the Coq kernel as part of its regular duties, which also includes things like tactic execution and plugin loading. coqchk is perhaps the smallest practical widely used tool to invoke Coq's kernel, and it does very little else.

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:40):

Indeed, Coq has basically a single proof checker if you want to understand it that way; there is a tool called coqchk which removes some features, so it is more minimal, but not different enough as to be considered a fully independent checker.

As said previously, it wouldn't be hard to write an independent checker, it has not been done because it takes a lot of resources and the demand is low; moreover it seems more interesting to write a verified checker which is what metacoq's team is doing.

view this post on Zulip Kevin Buzzard (Sep 30 2021 at 16:41):

OK I don't want to say something false so I think it's probably simplest if I just move the discussion of typecheckers away from this specific example and talk about their existence in some more general setting. Thanks a lot for your prompt responses!

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:42):

Also you can export to other tools, as @Gaëtan Gilbert did with Lean

view this post on Zulip Gaëtan Gilbert (Sep 30 2021 at 16:44):

that was in the other direction though

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:44):

Kevin Buzzard said:

OK I don't want to say something false so I think it's probably simplest if I just move the discussion of typecheckers away from this specific example and talk about their existence in some more general setting. Thanks a lot for your prompt responses!

Indeed, IMHO if your target audience are mathematicians, you gotta ask them to trust that we have the technology to check proofs in type theory in a reliable manner. How that's done is a subject on its own, and non experts get easily confused (like the famous questioning of the consistency of PA years ago by VV]

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:46):

Most common complain I heard is "how can you workaround Gödel when checking proofs"

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:47):

and indeed the best answer has been given by Harvey Friedman in his reverse mathematics project

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:47):

Tho the Hydra game usually suffices if you wanna use a more catchy example

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 16:48):

Tho if details are not needed, a shorter informal answer such as "Coq/Lean are equiconsistent with Set Theory" may suffice too

view this post on Zulip Karl Palmskog (Sep 30 2021 at 16:57):

the project that is currently closest to establishing the full formal link between a practical low-level proof checker and set theory based consistency is (in my view) Candle when compiled by the verified CakeML compiler: https://github.com/CakeML/cakeml/tree/master/candle

But this is simple types and not dependent types (which, to their defense, were enough to do Schemes)

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 17:01):

@Karl Palmskog IMHO that's debatable, the TCB seems more complex to me that what we have with Bruno's work + MetaCoq

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 17:01):

If what you are interested is in consistency

view this post on Zulip Emilio Jesús Gallego Arias (Sep 30 2021 at 17:01):

What kind of (relative) consistency theorem does candle intend to prove?

view this post on Zulip Karl Palmskog (Sep 30 2021 at 17:04):

they follow John Harrison's approach of encoding ZF axioms in HOL. The compiler is bootstrapped inside HOL4 itself, which is then used to compile the verified proof checker. I would argue it is currently a measure more trustworthy than CompCert (no need to rely on extraction+OCaml)

view this post on Zulip Karl Palmskog (Sep 30 2021 at 17:07):

soundness of semantics of HOL: http://www.sigplan.org/Awards/Dissertation/2017_kumar.pdf#page=91

view this post on Zulip Andrés Goens (Oct 03 2021 at 10:55):

@Kevin Buzzard is that a research article (or more a some general-interest kind of article)? Will you upload a preprint to ArXiv or similar when you finish writing it, and if so, could you link to it here?

view this post on Zulip Kevin Buzzard (Oct 03 2021 at 23:26):

It's my ICM talk and I'm sure I'll upload it to ArXiv


Last updated: Aug 19 2022 at 19:03 UTC