## Stream: Miscellaneous

### Topic: Formal proof failures

#### Anders Larsson (Sep 06 2022 at 14:15):

Formal proofs are of course supposed to be safe and proven correct. But is there any famous or interesting examples of failures?
Any formal proof that people accepted but turned up to miss some subtle premiss?

#### Alexander Gryzlov (Sep 06 2022 at 14:29):

This should probably be a separate topic

#### Karl Palmskog (Sep 06 2022 at 14:30):

since this is not specifically about using Coq (although Coq might be an example), moving to Misc

#### Notification Bot (Sep 06 2022 at 14:30):

This topic was moved here from #Coq users > Formal proof failures by Karl Palmskog.

#### Anders Larsson (Sep 06 2022 at 15:19):

Alexander Gryzlov said:

This should probably be a separate topic

Did I miss something? The question has a topic as far as I can see?

#### Théo Zimmermann (Sep 06 2022 at 15:22):

@Anders Larsson The topic that you used initially was #Coq users > Dependent vectors. People have fixed this by moving it to a different topic, then to a different stream (#Miscellaneous).

#### Anders Larsson (Sep 06 2022 at 15:28):

Théo Zimmermann said:

Anders Larsson The topic that you used initially was #Coq users > Dependent vectors. People have fixed this by moving it to a different topic, then to a different stream (#Miscellaneous).

Ah, then I know what to look for next time. Ironic to fail to post propely when I ask for failures. :-)

#### Meven Lennon-Bertrand (Sep 06 2022 at 16:52):

A similar question turned up multiple times on the Proof Assistants SE: here, here and here.

#### Meven Lennon-Bertrand (Sep 06 2022 at 16:57):

There were some examples given there, but no "high-profile" one. So although the absence of proof is not the proof of the absence, I think it is safe to assume no reasonably-sized, accepted formal proof has ever turned out to be false.

#### Anders Larsson (Sep 06 2022 at 17:33):

Meven Lennon-Bertrand said:

A similar question turned up multiple times on the Proof Assistants SE: here, here and here.

Interesting reading, all 3 of them!

#### Karl Palmskog (Sep 06 2022 at 17:47):

it's seemingly more common that someone formalizes the wrong property or definition (not what they thought they formalized). But this is usually not published, since the mistake may be detected by colleagues or reviewers.

#### Karl Palmskog (Sep 06 2022 at 17:48):

if you prove something about the wrong thing, it's (with high certainty) still a proof

#### Karl Palmskog (Sep 06 2022 at 17:57):

vacuous properties/axioms and mistakes in definitions can sometimes be detected by techniques like mutation analysis

#### Paolo Giarrusso (Sep 07 2022 at 08:27):

My google-fu is failing me, but I think I've seen examples in software correctness or security, tho I think all failures fall under limitations of the original proofs (bugs in unverified pieces of code, attacks not covered by the previous formalization, ...)

#### Karl Palmskog (Sep 07 2022 at 08:30):

ah, if we are playing that game (limitations of proofs and unverified code), then this paper has a list: https://www.cs.purdue.edu/homes/pfonseca/papers/eurosys2017-dsbugs.pdf

#### Karl Palmskog (Sep 07 2022 at 08:32):

but I think it's awkward that "proof failures" can get massive attention and scrutiny, while formal specification validation should arguably be a subfield of its own, but is something typically done in passing, if at all

#### Michael Soegtrop (Sep 07 2022 at 09:05):

An important aspect of Coq is that it covers such a wide area of math and technology. This allows it to write the specifications at a chosen level where they are especially simple to understand - which reduces the risk that the wrong properties are specified. The specifications at lower levels then factor out and can't be wrong - if both sides of a module interface can be proven correct, the specification of the interface is good. E.g. if one uses VST and CompCert, it wouldn't really matter if the specifications of CompCert wouldn't exactly match the C standard. As long as VST and CompCert use the same specification, it is fine (for this application). In some existing proofs this is done for > 10 levels of modules. So if one formally verifies larger parts of a system the risk of specification errors actually decreases, because larger parts of the specifications "factor out" and/or are shared (say in case the official specification of C would be written in Coq).

An interesting aspect here is the safety of automatic translation of propositions from one proof system to another. The conversion of proofs is generally trust worthy since the translated proof can be checked in the target system, but I wonder how one would ensure that the propositions are converted correctly.

#### Karl Palmskog (Sep 07 2022 at 13:36):

I agree that Coq gives the possibility to validate specifications by connecting formalizations at different abstraction levels. But how common is this really in practice? And we still might have only a formal connection between two wrong things

#### Michael Soegtrop (Sep 07 2022 at 14:29):

A classic example from Princeton which goes to quite some depth is (https://www.cs.princeton.edu/~eberinge/verified-hmac.pdf).

This connects a reasonably readable formalization of the FIPS spec to ComCert's definition of C or assembly.

#### Paolo Giarrusso (Sep 07 2022 at 17:27):

@Karl Palmskog as you probably know, the pitch of the DeepSpec project is about connecting multiple abstraction levels — which probably supports your point

#### Karl Palmskog (Sep 07 2022 at 17:29):

sure, you might get some smaller surface area to validate from doing that. But for example in the HMAC verification, FCF (crypto lib) seems to me to be the weak link

#### Paolo Giarrusso (Sep 07 2022 at 17:29):

@Michael Soegtrop CompCert certainly doesn't match the C standard... I'm not sure it's even a strict extension, in _part_ because "the C standard" isn't a formal concept.

#### Paolo Giarrusso (Sep 07 2022 at 17:30):

(Also the semantics of C/C++/Rust pointers is a research problem, even if it's probably more obscure than e.g. the semantics of weak memory)

#### Karl Palmskog (Sep 07 2022 at 17:31):

right, there are arguably three competing commercial formal C "standards": CompCert C, seL4's C, and the K C semantics. They're probably largely incomparable, and the relation to C99 or C11 is a mystery.

#### Paolo Giarrusso (Sep 07 2022 at 17:33):

I wouldn't say the relation is a mystery, but C itself is :-)

#### Paolo Giarrusso (Sep 07 2022 at 17:34):

but sb please stop me from going more off-topic

#### Karl Palmskog (Sep 07 2022 at 17:35):

the connection to topic is arguably: even if you prove that your program has, modulo the formal lang spec, some property, the lang spec could be a failure

#### Karl Palmskog (Sep 07 2022 at 17:37):

although if your property is interesting, I guess the lang spec could still be a useful failure

#### Paolo Giarrusso (Sep 07 2022 at 17:51):

the "off-topic" part was the specific research questions on C semantics

#### Paolo Giarrusso (Sep 07 2022 at 17:53):

I agree that if you prove correctness of a C program relying on "standard C" semantics, you might be in trouble; I'm skeptical "standard C semantics" is well-defined enough for the exercise, maybe unless you exclude a large set of features

#### Michael Soegtrop (Sep 07 2022 at 18:03):

@Paolo Giarrusso : well but that was my point - it doesn't really matter how the language spec of CompCert are related to the C standard. As long as VST and CompCert are using the same specification, it factors out.

agreed

#### Karl Palmskog (Sep 07 2022 at 18:05):

if your whole world consists of VST-verified programs, sure. But really, don't we expect VST-verified C code to be linked to unverified code or code verified in a different C semantics? And then relations among language specs might matter.

#### Karl Palmskog (Sep 07 2022 at 18:07):

in particular, I think you can compile seL4 using CompCert. But then the authors say "mind the gap" to the C semantics they verified it against.

#### Michael Soegtrop (Sep 07 2022 at 18:07):

In an embedded area (as I live in) long term probably not.

#### Michael Soegtrop (Sep 07 2022 at 18:08):

I expect that everything will be done with a matched suite of compilers and verification tools.

#### Karl Palmskog (Sep 07 2022 at 18:12):

linking with a huge unverified codebase was in my view the recipe for Fiat Crypto's success

#### Paolo Giarrusso (Sep 07 2022 at 18:13):

"unverified code" makes sense, but the code verification then is only sound in some limited sense

#### Paolo Giarrusso (Sep 07 2022 at 18:16):

But you don't expect fiat-crypto to compute the right results if some other C program has invalidated runtime state I guess?

#### Karl Palmskog (Sep 07 2022 at 18:17):

it can go wrong then of course. But language specs can play a role in whether the interaction [between Fiat Crypto and its clients] goes wrong.

#### Michael Soegtrop (Sep 07 2022 at 18:17):

Well today people do what can be done. But long term I expect unverified code to go away in embedded and high security areas - it is simply more expensive to test such stuff than to verify it.

#### Paolo Giarrusso (Sep 07 2022 at 18:18):

the usual argument I see is for verifying "security kernels" (in a broad sense) that ensure what you want even when combined with unverified code

#### Michael Soegtrop (Sep 07 2022 at 18:18):

The cost of verification of course also depends on the tools - I see huge progress here compared to 2..3 years back.

#### Paolo Giarrusso (Sep 07 2022 at 18:19):

there "combined" does not IME allow linking, but my experience is limited

#### Paolo Giarrusso (Sep 07 2022 at 18:20):

(I imagine HW capabilities _might_ change the equation here)

#### Michael Soegtrop (Sep 07 2022 at 18:21):

I imagine such a concept e.g. for running apps in JS/WebAsm in a browser and the like - the browser has to ensure that the apps can only do what they should. But my world is a bit different.

#### Paolo Giarrusso (Sep 07 2022 at 18:22):

my day job uses an L4-family microkernel, and IIRc Apple uses another one of those

#### Karl Palmskog (Sep 07 2022 at 18:23):

hardware can also pull out the verified rug from under you (compare Spectre, Meltdown, etc.)

#### Paolo Giarrusso (Sep 07 2022 at 18:26):

(reference re Apple: https://trustworthy.systems/publications/nicta_full_text/8988.pdf)

#### Anders Larsson (Sep 07 2022 at 20:22):

Paolo Giarrusso said:

I agree that if you prove correctness of a C program relying on "standard C" semantics, you might be in trouble; I'm skeptical "standard C semantics" is well-defined enough for the exercise, maybe unless you exclude a large set of features

You can probably get within the C-standard it with ease. But things like the number of bits in a byte or whether to use one-complement or two-complement is left implementation dependent (to not sacrifice speed I think). It's a different world.

#### Paolo Giarrusso (Sep 07 2022 at 20:39):

Even the semantics of comparing two pointers is contentious, and bootstrapping something like malloc() in standard C isn't possible (since malloc() is the source of "allocated storage"). Ralf Jung's blog posts are relevant: https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html.

#### Paolo Giarrusso (Sep 07 2022 at 20:42):

If you want your program to be portable, @Anders Larsson 's concerns become relevant, but at least from my POV (system software), it's not a problem to define those details by assuming e.g. the platform ABI standards. The bigger concern is whether some future compiler is going to decide my program has UB and can be compiled creatively.

Last updated: May 31 2023 at 03:30 UTC