I am writing an article about formal proof verification in mathematics, and the first case study I'm considering is the 2005 formally verified proof by Gonthier of the four colour theorem. I imagine my readers will mostly be mathematicians so will contain people who will not be able to tell the difference between the Appel--Haken proof (which "used a computer") and the Gonthier proof (which also "used a computer"). At some point I want to talk about the concept that a "complicated" program such as Coq can be used to turn a human-written tactic proof into a term in a type theory, and then a "simpler" typechecker such as Coq Coq Correct can be used to independently verify that the term has the specified type, thus giving more evidence that the proof is OK.

A natural place to talk about this sort of situation would be when discussing 4CT, but before I write it I was wondering whether it was actually possible to use an existing Coq typechecker to check this particular 4CT proof, and whether it has actually been done. If there are some practical reasons why this cannot be done then of course it doesn't matter, I can just move the explanation of type checkers to a "general" paragraph -- I am only asking about this particular proof because this first (in the article) concrete example of a formally verified proof felt like a natural place to deal with such questions.

@Kevin Buzzard if I understand you correctly, the closest I can think of that is currently practical/feasible is to use the `coqchk`

standalone proof term checker on the proof terms compiled by `coqc`

from the current incarnation of the 4color proof: https://github.com/math-comp/fourcolor

For example, `coqchk`

does not use the virtual machine for computations (`vm_compute`

), so my guess is it would take a *long* time to check the fourcolor proof terms.

Yes, the fact that the proof involved a gigantic computation was scaring me. Maybe it's best to put a discussion of type checkers elsewhere.

I would be interested in a comment from @Enrico Tassi though on the feasibility of checking the current 4CT in some more assuring way.

I will remark that according to Wikipedia it took Appel and Haken over 1000 hours to run their unverified code :-)

Actually, vm_compute was added to Coq to make reflexive proofs work nicely, and the 4CT was *the* thing back then. But I think the proof did type check even before that. Benjamin Gregoir just left, I can ask him for confirmation on Monday I guess.

Thanks -- no hurry.

if I remember a conversation with Matthieu Sozeau correctly, the future way to get verified fast computations inside proofs is via MetaCoq+CertiCoq+CompCert, i.e., one does a verified implementation in MetaCoq of something equivalent to `vm_compute`

and `native_compute`

, which is compiled by CertiCoq+CompCert down to machine code on the machine that does the checking. But I think there is quite a bit of work to do to get there.

I think there are 3 parts in the 4CT: the math part of the proof, the algorithm and its correctness proof. To check the entire proof Coq has to run the algorithm (which was proved correct). IMO all other parts are lightweight in terms of "type checking", they should be easy to check with simpler (less optimized) checkers than the one of Coq. The only part which is a bit long is running the algorithm which generates/explores the 700 cases. Today it takes 30 minutes with vm_compute, without it I would expect a factor of 10 or so, not really a "gigantic computation".

In order to un-xy things here, my main point regarding this proof was simply to explain to mathematicians why the situation is better than "instead of relying on one computer program, we rely on another one". So far my points are that (1) the chances that both programs have "the same bug" in the sense that one error gets past both of them is of course much smaller than the chances that either program has a bug; (2) Coq has been around for a long time and has a lot of users, the Appel-Haken code is one bespoke piece of code with few users; and then I wanted to go on to say (3) type checkers but as I say I can move the discussion of type checkers elsewhere.

I don't want to overload the poor mathematicians' brains too early :-)

yeah, I think this is a reasonable standard argument - `coqc`

(and `coqchk`

) have the advantage of not having anything to do with the domain of the four coloring argument. For computer scientists, I guess it always looks better if you have someone else set up the hoop you should jump through.

was wondering whether it was actually possible to use an existing Coq typechecker to check this particular 4CT proof, and whether it has actually been done.

Not sure I fully understand, but as others have said, the proof is maintained, that is to say, it is fully checked by Coq's master branch. You can also of course use the first code, installing an old Coq version. The proof was checked with Coq, otherwise what it would be the point?

I will remark that according to Wikipedia it took Appel and Haken over 1000 hours to run their unverified code :-)

I guess that the original proof by them had to check significantly more cases than the proof implemented in Coq, which likely relies on further optimizations of the original one.

The original proof was running on a computer from 1976 :-) I understand that the proof is verified with Coq, I would like to argue that "even if Coq has a bug in its kernel, one can look at the output term created from the proof script and independently verify using something other than Coq's kernel that it is a term of the correct type, so then who even cares if the kernel has a bug".

The other issue was that the original proof had to verify that a collection of graphs of size nearly 2000 had two properties but later developments cut this down to 633.

there is a clear difference between the "regular" checking by `coqc`

(which is done after each proof term has been built), and the "after-the-fact" checking of proof terms by a tool like `coqchk`

(which is not affected by plugins and other confounders). In principle, there could be a third-party checker of serialized proof terms output by a tool like SerAPI, but to my knowledge there are no tools like this currently (MetaCoq / Coq Coq Correct might be the closest)

The situation is simply that I would like to say "we relied on program X (Coq kernel) to check 4CT but actually once program X has done its work, we can use program Y to check it, where Y != X, so this gives us more confidence", but I don't really want to say this if in practice Y does not exist. But I know nothing about the inner workings of Coq. Does `coqc`

rely on the correctness of Coq's kernel?

The analogue which mathematicians will understand is "we use possibly buggy program X to factor this large number, but once it has finished running and produced factors we can easily check that the product of the factors is the large number without caring about whether X has bugs because we use program Y to multiply them together again"

I am not most qualified person to talk about this - but `coqc`

invokes the Coq kernel as part of its regular duties, which also includes things like tactic execution and plugin loading. `coqchk`

is perhaps the smallest practical widely used tool to invoke Coq's kernel, and it does very little else.

Indeed, Coq has basically a single proof checker if you want to understand it that way; there is a tool called `coqchk`

which removes some features, so it is more minimal, but not different enough as to be considered a fully independent checker.

As said previously, it wouldn't be hard to write an independent checker, it has not been done because it takes a lot of resources and the demand is low; moreover it seems more interesting to write a verified checker which is what metacoq's team is doing.

OK I don't want to say something false so I think it's probably simplest if I just move the discussion of typecheckers away from this specific example and talk about their existence in some more general setting. Thanks a lot for your prompt responses!

Also you can export to other tools, as @Gaëtan Gilbert did with Lean

that was in the other direction though

Kevin Buzzard said:

OK I don't want to say something false so I think it's probably simplest if I just move the discussion of typecheckers away from this specific example and talk about their existence in some more general setting. Thanks a lot for your prompt responses!

Indeed, IMHO if your target audience are mathematicians, you gotta ask them to trust that we have the technology to check proofs in type theory in a reliable manner. How that's done is a subject on its own, and non experts get easily confused (like the famous questioning of the consistency of PA years ago by VV]

Most common complain I heard is "how can you workaround Gödel when checking proofs"

and indeed the best answer has been given by Harvey Friedman in his reverse mathematics project

Tho the Hydra game usually suffices if you wanna use a more catchy example

Tho if details are not needed, a shorter informal answer such as "Coq/Lean are equiconsistent with Set Theory" may suffice too

the project that is currently closest to establishing the full formal link between a practical low-level proof checker and set theory based consistency is (in my view) Candle when compiled by the verified CakeML compiler: https://github.com/CakeML/cakeml/tree/master/candle

But this is simple types and not dependent types (which, to their defense, were enough to do Schemes)

@Karl Palmskog IMHO that's debatable, the TCB seems more complex to me that what we have with Bruno's work + MetaCoq

If what you are interested is in consistency

What kind of (relative) consistency theorem does candle intend to prove?

they follow John Harrison's approach of encoding ZF axioms in HOL. The compiler is bootstrapped inside HOL4 itself, which is then used to compile the verified proof checker. I would argue it is currently a measure more trustworthy than CompCert (no need to rely on extraction+OCaml)

soundness of semantics of HOL: http://www.sigplan.org/Awards/Dissertation/2017_kumar.pdf#page=91

@Kevin Buzzard is that a research article (or more a some general-interest kind of article)? Will you upload a preprint to ArXiv or similar when you finish writing it, and if so, could you link to it here?

It's my ICM talk and I'm sure I'll upload it to ArXiv

Last updated: Dec 07 2023 at 09:01 UTC