Formal proofs are of course supposed to be safe and proven correct. But is there any famous or interesting examples of failures?
Any formal proof that people accepted but turned up to miss some subtle premiss?
This should probably be a separate topic
since this is not specifically about using Coq (although Coq might be an example), moving to Misc
This topic was moved here from #Coq users > Formal proof failures by Karl Palmskog.
Alexander Gryzlov said:
This should probably be a separate topic
Did I miss something? The question has a topic as far as I can see?
@Anders Larsson The topic that you used initially was #Coq users > Dependent vectors. People have fixed this by moving it to a different topic, then to a different stream (#Miscellaneous).
Théo Zimmermann said:
Anders Larsson The topic that you used initially was #Coq users > Dependent vectors. People have fixed this by moving it to a different topic, then to a different stream (#Miscellaneous).
Ah, then I know what to look for next time. Ironic to fail to post propely when I ask for failures. :-)
A similar question turned up multiple times on the Proof Assistants SE: here, here and here.
There were some examples given there, but no "high-profile" one. So although the absence of proof is not the proof of the absence, I think it is safe to assume no reasonably-sized, accepted formal proof has ever turned out to be false.
Meven Lennon-Bertrand said:
A similar question turned up multiple times on the Proof Assistants SE: here, here and here.
Interesting reading, all 3 of them!
it's seemingly more common that someone formalizes the wrong property or definition (not what they thought they formalized). But this is usually not published, since the mistake may be detected by colleagues or reviewers.
if you prove something about the wrong thing, it's (with high certainty) still a proof
vacuous properties/axioms and mistakes in definitions can sometimes be detected by techniques like mutation analysis
My google-fu is failing me, but I think I've seen examples in software correctness or security, tho I think all failures fall under limitations of the original proofs (bugs in unverified pieces of code, attacks not covered by the previous formalization, ...)
ah, if we are playing that game (limitations of proofs and unverified code), then this paper has a list: https://www.cs.purdue.edu/homes/pfonseca/papers/eurosys2017-dsbugs.pdf
but I think it's awkward that "proof failures" can get massive attention and scrutiny, while formal specification validation should arguably be a subfield of its own, but is something typically done in passing, if at all
An important aspect of Coq is that it covers such a wide area of math and technology. This allows it to write the specifications at a chosen level where they are especially simple to understand - which reduces the risk that the wrong properties are specified. The specifications at lower levels then factor out and can't be wrong - if both sides of a module interface can be proven correct, the specification of the interface is good. E.g. if one uses VST and CompCert, it wouldn't really matter if the specifications of CompCert wouldn't exactly match the C standard. As long as VST and CompCert use the same specification, it is fine (for this application). In some existing proofs this is done for > 10 levels of modules. So if one formally verifies larger parts of a system the risk of specification errors actually decreases, because larger parts of the specifications "factor out" and/or are shared (say in case the official specification of C would be written in Coq).
An interesting aspect here is the safety of automatic translation of propositions from one proof system to another. The conversion of proofs is generally trust worthy since the translated proof can be checked in the target system, but I wonder how one would ensure that the propositions are converted correctly.
I agree that Coq gives the possibility to validate specifications by connecting formalizations at different abstraction levels. But how common is this really in practice? And we still might have only a formal connection between two wrong things
A classic example from Princeton which goes to quite some depth is (https://www.cs.princeton.edu/~eberinge/verified-hmac.pdf).
This connects a reasonably readable formalization of the FIPS spec to ComCert's definition of C or assembly.
@Karl Palmskog as you probably know, the pitch of the DeepSpec project is about connecting multiple abstraction levels — which probably supports your point
sure, you might get some smaller surface area to validate from doing that. But for example in the HMAC verification, FCF (crypto lib) seems to me to be the weak link
@Michael Soegtrop CompCert certainly doesn't match the C standard... I'm not sure it's even a strict extension, in _part_ because "the C standard" isn't a formal concept.
(Also the semantics of C/C++/Rust pointers is a research problem, even if it's probably more obscure than e.g. the semantics of weak memory)
right, there are arguably three competing commercial formal C "standards": CompCert C, seL4's C, and the K C semantics. They're probably largely incomparable, and the relation to C99 or C11 is a mystery.
See also Krebbers's C2HO and Sewell's work in Cerberus
I wouldn't say the relation is a mystery, but C itself is :-)
but sb please stop me from going more off-topic
the connection to topic is arguably: even if you prove that your program has, modulo the formal lang spec, some property, the lang spec could be a failure
although if your property is interesting, I guess the lang spec could still be a useful failure
the "off-topic" part was the specific research questions on C semantics
I agree that if you prove correctness of a C program relying on "standard C" semantics, you might be in trouble; I'm skeptical "standard C semantics" is well-defined enough for the exercise, maybe unless you exclude a large set of features
@Paolo Giarrusso : well but that was my point - it doesn't really matter how the language spec of CompCert are related to the C standard. As long as VST and CompCert are using the same specification, it factors out.
agreed
if your whole world consists of VST-verified programs, sure. But really, don't we expect VST-verified C code to be linked to unverified code or code verified in a different C semantics? And then relations among language specs might matter.
in particular, I think you can compile seL4 using CompCert. But then the authors say "mind the gap" to the C semantics they verified it against.
In an embedded area (as I live in) long term probably not.
I expect that everything will be done with a matched suite of compilers and verification tools.
linking with a huge unverified codebase was in my view the recipe for Fiat Crypto's success
"unverified code" makes sense, but the code verification then is only sound in some limited sense
But you don't expect fiat-crypto to compute the right results if some other C program has invalidated runtime state I guess?
it can go wrong then of course. But language specs can play a role in whether the interaction [between Fiat Crypto and its clients] goes wrong.
Well today people do what can be done. But long term I expect unverified code to go away in embedded and high security areas - it is simply more expensive to test such stuff than to verify it.
the usual argument I see is for verifying "security kernels" (in a broad sense) that ensure what you want even when combined with unverified code
The cost of verification of course also depends on the tools - I see huge progress here compared to 2..3 years back.
there "combined" does not IME allow linking, but my experience is limited
(I imagine HW capabilities _might_ change the equation here)
I imagine such a concept e.g. for running apps in JS/WebAsm in a browser and the like - the browser has to ensure that the apps can only do what they should. But my world is a bit different.
my day job uses an L4-family microkernel, and IIRc Apple uses another one of those
hardware can also pull out the verified rug from under you (compare Spectre, Meltdown, etc.)
(reference re Apple: https://trustworthy.systems/publications/nicta_full_text/8988.pdf)
Paolo Giarrusso said:
I agree that if you prove correctness of a C program relying on "standard C" semantics, you might be in trouble; I'm skeptical "standard C semantics" is well-defined enough for the exercise, maybe unless you exclude a large set of features
You can probably get within the C-standard it with ease. But things like the number of bits in a byte or whether to use one-complement or two-complement is left implementation dependent (to not sacrifice speed I think). It's a different world.
Even the semantics of comparing two pointers is contentious, and bootstrapping something like malloc() in standard C isn't possible (since malloc() is the source of "allocated storage"). Ralf Jung's blog posts are relevant: https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html.
If you want your program to be portable, @Anders Larsson 's concerns become relevant, but at least from my POV (system software), it's not a problem to define those details by assuming e.g. the platform ABI standards. The bigger concern is whether some future compiler is going to decide my program has UB and can be compiled creatively.
Last updated: May 31 2023 at 03:30 UTC