Stream: Miscellaneous

Topic: TCB snobbery


view this post on Zulip Karl Palmskog (Jun 11 2020 at 13:21):

I recently refused a review request for a "mechanized verification" paper using an (in my opinion very) unreliable verification tool, mostly since I didn't feel I could be sufficiently constructive. This got me thinking about the pros and cons of being a TCB snob, and how far should we push it. e.g., should the ITP community eventually only accept Coq developments that check with MetaCoq or HOL developments checked by CakeML's Candle?

@Bas Spitters would be very interested in your opinion here.

view this post on Zulip Bas Spitters (Jun 11 2020 at 14:10):

I understand what you are saying. I would be happy with any established tool in CPP/ITP/...

Two observations, there's been an initiative by ANSSI to get some clarity around this for the same reason. See the presentation at CPP.
https://popl20.sigplan.org/details/CoqPL-2020-papers/2/The-use-of-Coq-for-Common-Criteria-Evaluations
I think we should do more along those lines.

I recently tried to open a discussion with the K-framework about the same topic on the coq-list, but they stopped responding.
https://sympa.inria.fr/sympa/arc/coq-club/2020-02/msg00067.html

view this post on Zulip Karl Palmskog (Jun 11 2020 at 14:15):

Bas Spitters said:

I would be happy with any established tool in CPP/ITP/...

Consider the list from the CPP website: ACL2, Agda, Coq, Dafny, F*, HOL4, HOL Light, Idris, Isabelle, Lean, Mizar, Nuprl, PVS

There are some I have quite a lot of doubts about, e.g., Idris (seems like regular PL w.r.t. development) and Mizar (closed source). I think one may have to look at the application to say something for sure, e.g., crypto code in Mizar would probably be a no for me.

view this post on Zulip Bas Spitters (Jun 11 2020 at 14:25):

Order of quality/ TCB would be???
HOL Light
Isabelle
Lean,
Coq,
Agda,
HOL4,
ACL2,
Idris,
Dafny / why3
F*,
PVS,
Mizar (Although, I believe they recently made an independent checker)
Nuprl (not available)

view this post on Zulip Karl Palmskog (Jun 11 2020 at 14:30):

yes, I largely agree with that list (there are obviously some unmentioned specialized ones like HOL Zero which rank high)

view this post on Zulip Karl Palmskog (Jun 11 2020 at 14:33):

a precise technical comparison is probably not possible, or would just be endlessly debated on metrics. But maybe there is something one could do along the lines of what they do for university rankings: define 4-6 "factors" and give a "score" on each factor.

Obvious factors: complexity of foundations, soundness bugs susceptibility, machine-checked foundations status

view this post on Zulip Kenji Maillard (Jun 11 2020 at 14:38):

Isn't there a a typechecker for Nuprl formalized in Coq ?

view this post on Zulip Karl Palmskog (Jun 11 2020 at 14:42):

some aspects of Nuprl have been formalized in Coq, which I believe include fragments of a typechecker: https://github.com/vrahli/NuprlInCoq

I've tried to suggest to the main author to move this project to coq-community, since it's interesting for its extremely long proof-checking time - but no luck so far, see here: https://github.com/coq-community/manifesto/issues/60

For example, if we get verified extraction or compilation via CertiCoq, this would a very nice application.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2020 at 14:52):

There doesn't seem to be anything serious preventing the move to coq-community, right?

view this post on Zulip Bas Spitters (Jun 11 2020 at 14:52):

Yes, and there is redprl for example, which is very clean.
Someone should write a semantics of F* in Coq ... :-) @Kenji Maillard
HOL4 is being ported to isabelle
Mizar to Isabelle
( https://link.springer.com/article/10.1007/s10817-018-9479-z
https://link.springer.com/chapter/10.1007/978-3-319-96812-4_13 )
Perhaps smt-coq can help to make the other tools more rigorous

view this post on Zulip Bas Spitters (Jun 11 2020 at 14:57):

Freek always enjoyed comparing foundations based on size. We never really agreed :-)
https://www.cs.ru.nl/F.Wiedijk/zfc-etc/zfc-etc.pdf

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:08):

Pierre-Marie Pédrot said:

There doesn't seem to be anything serious preventing the move to coq-community, right?

Indeed it would only take Vincent about 3-4 clicks for the move to happen. But obviously it's up to him if/when to do it.

view this post on Zulip Bas Spitters (Jun 11 2020 at 15:36):

About verifying Coq, would either coq-of-ocaml or cfml be able to lift the coq implementation in ocaml to coq?

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:41):

I think the consensus is that post-hoc verification of the Coq implementation is super hard, probably easier to just build a new Coq on top of MetaCoq?

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:42):

this is sort of what they do in Candle on top of HOL4 and CakeML

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:45):

hehe, I wonder what a "TCB snob" logo could be? A huge cartoon corn figure ("kernel") with a red crossout?

view this post on Zulip Bas Spitters (Jun 11 2020 at 15:48):

but meta-coq would still depend on the ocaml compiler...

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2020 at 15:50):

... and on your processor not being broken

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:51):

in HOL4, they bootstrap CakeML inside HOL4, i.e., the inside-HOL4-defined CakeML compiles a copy of itself to machine code, then one uses the bootstrapped CakeML to compile Candle

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:51):

so you don't need to trust PolyML beyond that it compiled the HOL4 kernel correctly

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2020 at 15:53):

at some point "Reflections on Trusting Trust" applies

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2020 at 15:53):

could well be that eons ago somebody introduced a hidden backdoor in PolyML

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:53):

ah, but then we are in adversarial territory, not just human fallibility territory

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2020 at 15:54):

Right.

view this post on Zulip Pierre-Marie Pédrot (Jun 11 2020 at 15:54):

(Your logo thing is hauting me. Now I have visions of a trusted kennel)

view this post on Zulip Karl Palmskog (Jun 11 2020 at 15:57):

Honest Bob's used kernels(?) None of your bases are belong to us(?)

view this post on Zulip Enrico Tassi (Jun 12 2020 at 09:56):

I think ranking systems wrt their trustworthiness is very delicate. I know a bunch of system and I find the ranking up there mostly off.

For example "Isabelle" alone means nothing and the TCB of Isabelle/Pure and Isabelle/HOL are different (the latter includes definitional mechanisms that made some noise recently). Last week I learnt that some notion of type classes is in Isabelle/Pure, is it widely known?
ACL2 uses all sort of black-box automation, what is the difference with Dafny? And why HOL4 is so far from HOL-Light? I'm pretty sure nobody found a proof of false Nuprl in the last decade, why is it so low?

There is also a matter of age and number of users. A tiny piece of software with no users is going to have very little known bugs while a large one with many users will have more, but this says nothing about their trustworthiness. Also the response time when a severe bug is found has something to say about trustworthiness to me.

At ITP in Nanjing Morris did classify some systems along more axis. I did not agree 100% with the info he could gather on systems I know, but I found the result way more interesting than a linear order like the one above.

view this post on Zulip Karl Palmskog (Jun 12 2020 at 10:00):

Enrico Tassi said:

I'm pretty sure nobody found a proof of false Nuprl in the last decade, why is it so low?

Thanks for weighing in. I think the main issue with Nuprl is that the system is not publicly accessible at all. At most, researchers can get access to a remote shell which allows you to interact with it. Hence, there are very few external users, and they might not report their findings (no public issue tracker to my knowledge).

view this post on Zulip Karl Palmskog (Jun 12 2020 at 10:02):

one axis/component that could be relevant is if the system developers have the ambition to prevent soundness bugs (and document them publicly when they occur in publicly released versions). For example, to my knowledge, Rustan's goal with Dafny has so far been mostly educational, which is a valuable one, but means priorities are very different from, say, HOL Zero's

view this post on Zulip Karl Palmskog (Jun 12 2020 at 10:18):

Enrico Tassi said:

There is also a matter of age and number of users. A tiny piece of software with no users is going to have very little known bugs while a large one with many users will have more, but this says nothing about their trustworthiness. Also the response time when a severe bug is found has something to say about trustworthiness to me.

@Enrico Tassi so do you think relevant publication venues, e.g., CPP, should curate current ITPs officially in some way, e.g., designate them along some axes? In my mind at least, if paper uses a new or tiny system, it would be fair to apply higher reviewing standards than for an "established" system

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:15):

I think Nuprl has recently released VM images, tho the same problems still apply. I’m surprised to see Agda ranked so high without a trusted kernel and with all of their soundness issues, does HOL4 have no kernel?

view this post on Zulip Karl Palmskog (Jun 12 2020 at 11:17):

HOL4 is based on LCF approach, so it indeed has a kernel. However, despite same foundations and being close to HOL Light, the kernel is (I believe) significantly larger than HOL Light's for historical reasons (it inherits code from Cambridge LCF) going back 30+ years)

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:17):

(the question was “isn’t HOL Light kernel smaller than 100 kLoC of Haskell” — or whatever the Agda implementation is)

view this post on Zulip Karl Palmskog (Jun 12 2020 at 11:18):

from memory, HOL4 kernel is a couple of thousand lines of Standard ML, while HOL Light kernel is 600-700 lines of OCaml

view this post on Zulip Karl Palmskog (Jun 12 2020 at 11:18):

but doesn't Agda have a standalone checker?

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:19):

whoops sorry

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:21):

my message editing screwed up a bit

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:22):

the _idea_ is that Agda should be explainable as a Martin–Löf type theory, but we don’t know that formally

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:23):

(also, an ML type theory with several extensions)

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:23):

in any case, at least according to Cedille papers, even kernels with inductive types are necessarily bigger (LoC-wise) than HOL-style kernels

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:24):

last — I don’t know today, but I remember an Agda soundness issue whose fix wasn’t released for like an year?

view this post on Zulip Paolo Giarrusso (Jun 12 2020 at 11:30):

probably things have changed, but that wasn’t seen as a big deal back then: https://github.com/agda/agda/issues/1127 https://lists.chalmers.se/pipermail/agda/2014/006712.html. Modern agda —safe is probably safer, but there’s no independent checker yet.

view this post on Zulip Enrico Tassi (Jun 12 2020 at 12:05):

Karl Palmskog said:

In my mind at least, if paper uses a new or tiny system, it would be fair to apply higher reviewing standards than for an "established" system

AFAIK system diversity has been a value so far. Having different reviewing standard depending on some ranking is not something I would do. When I read a formalization paper I don't care much if the system is 99.99% bullet proof or not, I just want to learn something.

view this post on Zulip Enrico Tassi (Jun 12 2020 at 12:06):

Picking a system for doing some tasks is a different story, especially in the industry. Soundness would probably the last of my concerns.

view this post on Zulip Bas Spitters (Jun 12 2020 at 12:46):

But something we have to deal with in the context of eg FMBC. There people use very different tools. And one person claiming to have verified something, may mean something quite different from when someone else says it.

view this post on Zulip Karl Palmskog (Jun 12 2020 at 12:47):

Bas Spitters said:

But something we have to deal with in the context of eg FMBC. There people use very different tools. And one person claiming to have verified something, may mean something quite different from when someone else says it.

Right, and typically people are not very modest in their claims, e.g., "this is the most reliable formalization ever". I think there is a professional obligation to call out far-reaching claims when underlying implementations are doubtful.

view this post on Zulip Bas Spitters (Jul 09 2020 at 07:15):

The K-framework is now claiming that their kernel is much simpler than Coq's 200lines vs "8000lines of OCaml"
Doesn't seem like a very fair comparison... @Karl Palmskog ?
https://twitter.com/RosuGrigore/status/1279252137705439235

The idea is that the programming language semantics should be vetted by a standards committee and the specs to prove by the "customer". The rest, i.e., the program together with its correctness proof object, will be guaranteed by a 200 LOC simple proof checker. https://twitter.com/pparadzinski/status/1279217054441234432

- Grigore Rosu (@RosuGrigore)

view this post on Zulip Enrico Tassi (Jul 09 2020 at 07:40):

Hum, I like the idea of considering the PL as granted. Then I guess the kernel of Coq written in Gallina should be 0 lines :upside_down:

view this post on Zulip Karl Palmskog (Jul 09 2020 at 08:07):

for Lean at least, I know they had a standalone checker in about 2k lines of Haskell at one point. HOL Light's minimal kernel is about 500-600 lines. Probably it depends on the logic and design of the "kernel format". I'm not sure what would be the most fair comparison here, but indeed the one with the integrated Coq checker is not the most straightforward.

view this post on Zulip Bas Spitters (Jul 09 2020 at 10:24):

I understand they are trying to run a company, but it would be nice if they would be fair to other systems (like Coq). It's not the first time this happens, and this was for a big audience.

view this post on Zulip Enrico Tassi (Jul 09 2020 at 16:57):

Comparison is hard and many aspects are usually not even taken into account. For example having a very very tiny kernel does not seem that nice if then you need to have a non trivial encoding for common notions like lists (hello HOL :-). Another aspect is efficiency: if you write a naive reduction, you are done with a screen of OCaml/Haskell code. A completely different story is to have an evaluator that can crunch on the Four Color theorem reflexive proof: being small but naive does not necessarily make your system usable (ditto for your checker, if it is 100x slower it is just hypothetical). Other aspects are more social, like what happens when there is a serious bug: is it logged, studied, fixed, deployed? Can I see the sources and the fix? Another aspect I like to think about is the bus factor: it is not specific to the kernel, but before adopting a system for anything serious it may be worth knowing who/how many maintain it or can do it if needed (eg if a serious bug is found). This also applies to the technology it stands on, which may be an inconvenient lock in (hello Nuprl).

To me just counting the lines is better than nothing, but also very very superficial. It's like evaluating research based on how many papers you have! (oops...)


Last updated: Aug 19 2022 at 19:03 UTC