Hi. An elementary doubt.

What is the difference between model checking and proofs made with proof assistants like Coq?

Are they the same? Or are they different?

I was thinking In the context of hardware design models as done by Koika and Kami, which seem to model hardware designs within Coq and then proof properties about them.

Julin S said:

What is the difference between model checking and proofs made with proof assistants like Coq?

They are not really comparable. Model checking is the field of temporal logic, while Coq is a proof assistant.

If you had asked "what is the difference between TLA+ and Coq", I would have answered that one is dedicated to temporal logic while the other one is dedicated to intuitionistic logic.

in my view, model checking is not necessarily about temporal logic, lots of different automata checking algorithms fit as well. Several model checking algorithms have been verified in proof assistants, for example: https://www.isa-afp.org/entries/CAVA_LTL_Modelchecker.html

I wouldn't see the term "model checking" that strict - many people in the industry call a wide variety of static assertion checking in HW and SW "model checking". I think this typically includes everything e.g. Dafny or Jasper can do.

I would say the main difference is that a proof assistant has a substantially richer logic and allows to state many orders of magnitude more advanced specifications than used in what is typically called model checking. The assertions one checks in model checkers are frequently one liners, while the specification one verifies with a proof assistant can be rather elaborate and have 100 pages. This then frequently requires manual proofs while model checkers typically rely on automated SMT solvers. But I would say that Coq offers a lot of efficient automation in areas which are far beyond what and SMT solver can do.

Think e.g. of the formally verified C compiler CompCert - in the logic of most (all?) model checkers you can't even write down what it means that that a C compiler is correct - in Coq you can write it down and you can prove that an implementation fulfills such a specification.

I would say another difference is that model checkers are usually fairly special purpose for specific tasks, say verification of timing in HW pr verification of arithmetic in HW. It is common to use different model checkers with incompatible specification languages for such tasks.

In Coq you can specify and verify whatever you want - all in one tool. Say you can verify number theory for cryptography, an implementation of cryptographic C code, a compiler which compiles the code to assembly and a CPU which runs the code. All this is actually done in Coq. With a single model checker you can at best do a tiny piece of this chain.

Karl Palmskog said:

in my view, model checking is not necessarily about temporal logic, lots of different automata checking algorithms fit as well. Several model checking algorithms have been verified in proof assistants, for example: https://www.isa-afp.org/entries/CAVA_LTL_Modelchecker.html

The example you cite has "LTL" in its name, so this is all about temporal logic. I don't know of a single model checking framework that does not use a modal logic akin to temporal logic.

Spin *supports* writing temporal logic formulas, but under the hood it's all Buchi automata: https://spinroot.com/spin/what.html

in fact, Buchi automata are strictly more expressive than LTL

people find LTL nice to write, so they do this:

the automaton of the negation of the LTL property is first constructed, then the verification process is reduced to the emptiness problem of the product automaton

But nothing prevents you from writing a "spec automaton" instead of LTL, and in fact this might make verification faster

are SMT solvers or Dafny really included under “model checking”? Those are automated theorem provers but they still do proofs… the core idea in model checking is closer to QuickCheck/SmallCheck — make your theory have finite models, so that you can generate many or all of them and check your property on those.

if you can check all models the result is as strong as a proof, otherwise it’s only very systematic testing.

SMT can obviously be used to implement model checking, but to call "unqualified SMT" a model checker is pretty weird

Dafny might be described as a "program verification toolchain", you write some program and program logic stuff, and it gets reduced to verification conditions possibly discharged by SMT. To me, Dafny falls under the deductive/logic umbrella rather than the model checking one (which typically aims at complete push-button verification from spec + model)

@Paolo Giarrusso @Karl Palmskog : I would say if Dafny is "model checking" or "program verification" depends on what you actually do with it - which depends a lot on what you actually **can** do with it. If the things you want to prove are non trivial, Dafny gets quickly infeasible. I remember a paper on verification of the Shorr-Waite algorithm in Dafny where I still don't get how people come up with the invariants they write down there. In Coq it looked much simpler.

So I would think that most people are verifying one liner asserts with Dafny rather than elaborate specifications, and this is - at least in the industry - typically called model checking (don't ask me why - there is typically hardly a "model").

Btw.: I tend to interpret the existence of Lean such that program verification with Dafny is a failed experiment.

I wrote one paper with a Dafny artifact, and to me there is no built-in scaling problem in the verification approach itself (you can add axioms and hints and other stuff to get through nearly every problem). The two main problems I saw are:

- lack of domain-specific libraries, like in the Coq Platform
- first-order logic with some cherry on top is still quite weak and awkward

the main highlight of Dafny is arguably the interface and feedback loop with programming + SMT + squigglies. It can likely be emulated with Coq + Hammer + VsCode, but lots of work to be as productive as a skilled Dafny programmer can be.

Well I tried Dafny a while back and my conclusion was that it is much harder to understand why certain goals can't be handled by the SMT solver than doing proofs semi-automated in Coq. Typically this were issues that I did something wrong with invariants, but it is not helpful if the proof goals or error messages in case one does something wrong are not understandable.

So my take on it is that it is nice as far as it goes - but it doesn't go very far. Say like a convertible in summer, but with a 10l gas tank you can't refill.

the applications they have done at MSR using Dafny are pretty significant:

@Michael Soegtrop I’ll happily record that industry people extend “model checking” to “automated theorem proving” if they don’t care for the difference — even if I don’t know how a model checker would support hints/axioms? — but at least some academics will expect one to care about the distinction.

and regardless of the words, the distinction between the two kinds of tools seem important at least sometimes

on Dafny-vs-Coq, how much of that is just the general “automated-vs-interactive proving” tradeoff?

it's more of a "specialized-tool-vs.-general-tool" tradeoff I think

right, your complaints probably fit there — I was thinking @Michael Soegtrop ‘s complaints.

Honestly, I haven’t seen good error messages even from `lia`

— its saving grace is that it has a relatively clear specification.

conversely, I’ve seen at least _one_ talk about tactics for F* — and the argument was “it’s easier/more maintainable to use tactics than to write hints by reverse-engineering the failure”.

@Paolo Giarrusso : In industry what counts is what you can safely do in a given time frame. And when you rely on SMT solvers you have to limit complexity quite a bit to be sure you won't run against walls. What remains then is what is typically called "model checking" in industry.

For sure one can do substantially more fancy things with SMT based tools, but I have doubts that such projects are manageable.

So I would say in industry there is little room between "rather simple applications of SMT" and "interactive theorem proving".

Karl Palmskog said:

it's more of a "specialized-tool-vs.-general-tool" tradeoff I think

I would think the specialization of tools is required to keep SMT applications manageable in larger projects.

probably few will agree, but my view of model checking from an ITP standpoint is that it's a class of algorithms for soundly and completely ("push button") deciding certain properties of state transition systems, and providing counterexamples when properties don't hold.

As such, model checking algorithms *could* be implemented and run inside Coq efficiently, with trust reduced to Coq's kernel. If one can reduce a general verification problem to a model checking problem, that's generally a good idea, due to all researcher time invested into model checking tooling/theory.

(I actually totally agree with the fact it would be a worthwhile endeavour to implement model checking algorithm and prove them in Coq. Why not the best of both worlds?)

the main issue I see right now in implementing state-of-the-art algorithms is that they do optimizations down to ISA level to run efficiently, and without a mature CertiCoq-based `native_compute`

this would be hard to mirror inside Coq. A second problem is that the specs of many model checking algorithms and in particular their optimizations are not formalized at ITP level.

The question is of course if one needs all the latest optimization bells and whistles. I would think that things can be better factorized in Coq.

if you specify the model algorithm in a good way, you can probably derive specialized versions that rely on properties you have proved at the Coq level, sure

but if you *can* make a general implementation inside Coq with bells and whistles, this is probably very useful, since not everyone wants to factor and fiddle in Coq if the model checking function running time is reasonable

@Karl Palmskog : I would say my experience on how people use the work "model checking" matches pretty much your description. As far as I can tell the key point is "checking properties with reasonably reliable push button machinery". But I would think that these days this also extends to simple arithmetic goal, not only state machine systems - essentially everything what todays system can prove "push button".

well, the difference in the arithmetic problems is that you usually don't have completeness...

Regarding optimization: sure - I just think that something simpler would also be very useful already. In practice people also do analysis over "up to X transition" where X is a large number, and that is where you really need performance. IMHO in Coq one would do such things with modular inductive proofs and use automatic checkers mostly for getting the cases sorted out during induction or for sequences of blocks.

but we are arguably in a terrible state with automation of inductive proofs in Coq. Dafny actually has a some interesting stuff in that direction.

for example, CoqHammer explicitly avoids all induction, we have no standard techniques like rippling for current Coq, and machine learning of inductive tactics is not looking great either

Hmm I somehow like it in Coq - I just tend to proof special induction schemes to make things simpler sometimes. Can you elaborate on what Dafny does better here? My experience with Dafny (2 years back) is more that it tries to be smart with the effect that I don't understand it any more.

for example, Rustan did stuff like this and improved on it a couple of times: https://link.springer.com/chapter/10.1007/978-3-642-27940-9_21 ("Automating Induction with an SMT Solver")

we didn't ask about it in the survey, but I think very few Coq users are skilled/motivated enough to write their own sophisticated induction schemes

I guess I meanwhile just know how to do induction - what to generalize, when to use special induction schemes or induction scheme like lemmas, ... so that I have little issues with the kind of things mentioned in the paper. But I got the impression that such tricks obscure things in case things get really complicated. I guess this is all very subjective.

But I honestly find it astonishing how few lemmas the standard library e.g. has for lists (or even useful function for lists for which one could state such lemmas). I always wonder if I should add my stuff, or if it is too specific.

I think Hahn (https://github.com/vafeiadis/hahn) has one of the most comprehensive list lemma collections, but always room for more in Stdlib and in Platform libraries.

@Karl Palmskog : thanks for the pointer! Not really what I am doing, but still quite nice.

But I think that the standard library should have much more of this. Either it provides lists with pretty much everything one needs, or not at all. As is the lists in the standard library mostly seem to have what is needed elsewhere in he standard library, but not what users might need.

the history of Coq use is to a regrettable degree a history of people in different locations at different times writing list lemma collections

I guess I am one of them ... although interestingly there is rather little overlap between Hahn and what I have.

@Maxime Dénès : what became out of the standard library 2 project? Do you think it makes sense to round up lists in standard library 1?

IINM, the goal of stdlib2 was to be some minimalistic stdlib enabling to compile MathComp. So the list lemmas would be the one already there in MathComp (which I personally prefer to the stdlib, if only because it's much more homogeneous).

I wouldn't invest time in those things in stdlib1 but that may be a very personal opinion.

Yes, I should switch my work to MathComp . I guess it is going up-hill a few 100m but then downhill for a very long time ...

We had some discussion about the stdlib in the last hackathon. I did keep a lot of notes for that. I wanted to wait a little longer for the results of the coq survey however before taking any action. I think one of the first actions we might want to take is to make a stand-alone lists/nat library that everybody can use, but that has not been decided at all yet.

Michael Soegtrop said:

Yes, I should switch my work to MathComp .

Well, I understand you are using Flocq and Interval quite a bit so you couldn't completely get rid of the stdlib reals (although mathcomp analysis contains results that make both worlds reasonnably interoperable). If you are looking for a starting point, I highly recommend reading the mathcomp book: https://math-comp.github.io/mcb/

Michael Soegtrop said:

I guess it is going up-hill a few 100m but then downhill for a very long time ...

Indeed

Yes the reals are a bit of a problem and one of the reasons why I didn't switch as yet, but as you said this should be solvable. I anyway have interval and gappa packed into larger tactics, which also could handle the mathcomp reals equivalence.

Another psychological issue is that I can't decide if I should use just the library without the tactic language or both. I am mostly using Ltac2 these days as default tactic language. I guess I should be pragmatic about this and switch the tactic language every other proof, depending on what is more convenient.

I just realized that this is getting increasingly off topic - sorry.

Probably, I find ssreflect nice to write proofs by hand but Ltac2 is probably better to write automatic tactics.

I guess so. But I find it rather inconvenient to switch the tactic language between writing automation and manual proofs - mostly because this is a continuum. And I am not so sure if the benefits of the ssreflect language compensate this. Well I will never find out without trying.

Is it not possible to use ssreflect stuff and not use the proof langauge?

Yes, sure. The big question is if one wants to do this.

@Ali Caglayan it’s certainly possible by writing manual proof terms, but one cannot in general replace ssreflect rewrite or ssreflect apply with the non-ssreflect tactics — they use different unification algorithms.

and tactic unification does not seem too happy with canonical structures sometimes? hard to tell.

For most things, what language your wrote the proof in shouldn't matter. So I am not asking if you can salvage the proof scripts for ssreflect, but rather can you use the terms from ssreflect. How you write your proofs should be up to the user no?

And my answer is: some lemma statements work better with ssreflect tactics. Or more precisely, with evarconv unification.

(and viceversa, with stdpp I've seen cases where I had to use non-ssreflect tactics for certain lemmas)

re “should be up to the user?”, of course, but I’m talking about Coq as it exists today.

Last updated: Apr 14 2024 at 10:39 UTC