Stream: Miscellaneous

Topic: Proof ground


view this post on Zulip Bas Spitters (Jun 29 2020 at 20:15):

The Isabelle community is organizing a competition. Any experiences from last year? Is it fair? Why did Isabelle win?
https://www21.in.tum.de/~wimmers/proofground/

view this post on Zulip Cody Roux (Jun 30 2020 at 14:47):

In my experience, it was reasonably fair, though it would be nice to see some challenge problems offered by e.g. the Coq community. If I recall correctly, the big edge was formalization experience, and standard library maturity, albeit for relatively unsophisticated mathematics, with a single exception: a certain theorem about arbitrary sets and functions which was significantly easier to prove classically, and so advantaged provers based on classical logic. Even adding EM as an axiom only marginally helps, since then one needs to prove lemmas based on it, losing precious time.

Time was the obvious bottleneck, and Hammer style proof dispatch was pretty intimidating from the Isabelle side. I do wish there had been more ssreflect hackers, since I wonder if they have tricks to make proofs faster.

view this post on Zulip Karl Palmskog (Jul 01 2020 at 10:13):

seems obvious that a competition organized by the Isabelle community will be won by that same community. Do they have any rules about challenge problems? One could easily construct problems that are orders of magnitude more easily solved in CIC/cubical, but how would those problems be considered by HOL users (probably as unfair)?

view this post on Zulip Karl Palmskog (Jul 01 2020 at 10:19):

was (the whole of) MathComp even available to Coq users? (Since Lean users apparently had the whole of Mathlib)

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:45):

Hi there, I'm one of the organizers of Proof ground. We had a Call for Problems which we posted on the Coq mailinglist. Also we had two submissions of problems in Coq that made it into the competition. We tried hard to make it fair between the provers and people from the coq and Lean community helped us translating the tasks.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:45):

We are always happy for ideas for new tasks, and there will be a edition with next year's ITP hopefully with an onsite contest in Rome.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:47):

When finding fair tasks it is always the problem that we have to take the intersection of the material that is conveniently available in all (three) provers. That makes it even harder to come up with interesting, fair and solvable within under 2 hours.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:49):

Karl Palmskog said:

was (the whole of) MathComp even available to Coq users? (Since Lean users apparently had the whole of Mathlib)

what libraries would you like to have available in such a competition? We definitely have to add more there to be fair against Lean's mathlib and Isabelle's material in the distribution.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 16:50):

@maximilian p.l. haslbeck the "standard sets of libraries" for Coq is huge, probably running about a million lines of code.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 16:51):

we are trying to define the "Coq Platform" which includes nearly all most-used libraries/projects, but this will not be generally available for a while

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:51):

so, include everything?

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:52):

ah, yeah, at least, people that want to participate could tell us which libraries they want, then we can add them

view this post on Zulip Karl Palmskog (Jul 02 2020 at 16:52):

right, but this is going to be a big problem as Coq evolves

view this post on Zulip Karl Palmskog (Jul 02 2020 at 16:53):

so if you want Proof Ground-like events to be recurring, probably the only viable solution for Coq is to tie it to the Coq Platform (which will have regular releases after every major Coq version)

view this post on Zulip Karl Palmskog (Jul 02 2020 at 16:55):

@maximilian p.l. haslbeck but generally, how can one have an even playing field for such different logics? As I wrote earlier in the thread, depending on the problem, either Isabelle/HOL+Sledgehammer may have an incredible advantage or a definition may be impossible to express at all in HOL

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:57):

yeah, of course Isabelle does not have dependent types. So lots of interesting problems can't be posed in all provers, that's for sure. I quess we have to use the common denominator.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 16:57):

I think I know too little about Coq Platform. but if the community agrees we can make it standard for Proof Ground.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 16:58):

but if we go lowest common denominator for problems, CIC-based systems (Lean, Coq) would be disadvantaged compared to Isabelle automation, so what would be the incentive to compete directly?

view this post on Zulip Karl Palmskog (Jul 02 2020 at 17:00):

To me, the only obvious way to attract CIC-based users would be with different "weight class" which includes problems where dependent types actually pay off

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:00):

One goal of proof ground is to find out about the strengths and weaknesses of certain provers (Isabelle has no dependent types, :) what a learning), and also learn from each others.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 17:00):

see here for more about Coq platform: https://github.com/MSoegtropIMC/coq-platform/blob/master/charter.md

view this post on Zulip Théo Zimmermann (Jul 02 2020 at 17:01):

Important clarification about the platform is: it doesn't exist yet. But it likely will in time for the next Proof Ground.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:01):

I like the idea of weight classes, but proof ground rather tries to bring people together.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:02):

okay, I will come back to Coq platform then in a while. We will definitely announce in advance information about the next edition.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:03):

But in the mean time, our platform is free to use by any community. If you want to host a contest (involving problems tailored to CIC) feel free to. We will be happy to host it.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 17:03):

I think the strengths and weaknesses are quite well known in general. For example, let's say you have some problems with DFAs/NFAs. It is well known that this will force you to carry around a finiteness condition in Isabelle, and that it can be done easily with finite types in Coq without much extra work.

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:04):

We can very much add a different domain than http://competition.isabelle.systems/ , if you prefer :)

view this post on Zulip Simon Hudon (Jul 02 2020 at 17:04):

It might also be interesting to have larger sizes of formalization. A scale where HOL and CIC will have significant differences in the definitions. You could reward people who solve it in two different provers too

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:05):

I also find it interesting to think about how to measure the "value" of a proof, so not only that it goes through, but also its beauty, maybe in length, or whatever?

view this post on Zulip maximilian p.l. haslbeck (Jul 02 2020 at 17:08):

In the Isabelle zulip we started a stream about Proof Ground. I invite you to join the discussion there. It would be nice to share streams across zulip servers, is that possible?

view this post on Zulip Karl Palmskog (Jul 02 2020 at 17:09):

I don't disagree with the competition concept in general, but having a competition means that winning is large part of the motivation. If problems are not suited to your logic, it will become a point of contention. As an analogy, intuitionistic and classical logic have completely different benchmark sets (ILTP vs. TPTP), because experts know you can't do well (win the competition) in both.

view this post on Zulip Théo Zimmermann (Jul 02 2020 at 17:10):

maximilian p.l. haslbeck said:

In the Isabelle zulip we started a stream about Proof Ground. I invite you to join the discussion there. It would be nice to share streams across zulip servers, is that possible?

There might be a way to create a bridge where a bot would cross-post messages to the other Zulip.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 17:16):

another analogy: some of my earlier colleagues regularly participated in programming competitions. In these competitions, one is not only judged for correctness (on a hidden test dataset) but also execution speed. Since compilation time doesn't count, everybody uses C++ and precomputes as much as possible with template metaprogramming. Anyone who doesn't use C++ will always lose, so nobody uses Java, Python, etc.

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 17:21):

If it’s true that an Isabelle-organized competition is necessarily biased, even if unintentionally, some combo of <joining the organization, or organizing another one> might be a good fix. @maximilian p.l. haslbeck is even offering the platform they built for this, so kudos I guess :-)

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 17:29):

in practice, I guess you want some form of peer-review for the competition criteria, probably by co-organizing (or maybe in an actual paper, but I am not sure that’s realistic)

view this post on Zulip Cody Roux (Jul 02 2020 at 21:08):

Karl Palmskog said:

maximilian p.l. haslbeck but generally, how can one have an even playing field for such different logics? As I wrote earlier in the thread, depending on the problem, either Isabelle/HOL+Sledgehammer may have an incredible advantage or a definition may be impossible to express at all in HOL

Certainly if HOL is crushing Coq on these problems, it should be considered as an open challenge for Coq tooling to improve. Which it has, of course, dramatically, but it's good to have the side-by-side. I don't think the expressiveness gap is that huge.

view this post on Zulip Karl Palmskog (Jul 02 2020 at 21:28):

Cody Roux said:

Certainly if HOL is crushing Coq on these problems, it should be considered as an open challenge for Coq tooling to improve. Which it has, of course, dramatically, but it's good to have the side-by-side. I don't think the expressiveness gap is that huge.

Are you familiar with how Sledgehammer/Metis works and the HOL foundations? Functions in HOL don't have to be computable or fully defined, and classical logic and classical choice is built into Isabelle automation. CoqHammer, in contrast, is designed around intuitionistic logic from the start. Would you ask a classical and intuitionistic mathematician to compete to prove the same theorems in real analysis?

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:42):

with a single exception: a certain theorem about arbitrary sets and functions which was significantly easier to prove classically, and so advantaged provers based on classical logic. Even adding EM as an axiom only marginally helps, since then one needs to prove lemmas based on it, losing precious time.

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:45):

It's entirely fair to have a separate category for intuitionistic theorems; those are stronger. I'm not sure whether Coq _needs_ to be worse for classical proofs...

view this post on Zulip Karl Palmskog (Jul 02 2020 at 21:47):

the eval for the new sauto tactic for CoqHammer suggests it was around twice as good on intuitionistic problems as classical ones

view this post on Zulip Simon Hudon (Jul 02 2020 at 21:48):

Shameless plug ω+1: in Lean, the in mathlib library, the axiom of choice and excluded middle is ubiquitous. It does make for a very different experience to using Coq. It's just a matter of having the relevant theorems and the automation being able to use them

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:49):

Is that a matter of focus/design choices of sauto, or does it have to do with Coq itself? I expect both.

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:51):

Hm, how did Lean do the first time? I don't see results under https://www21.in.tum.de/~wimmers/proofground2019/

view this post on Zulip Simon Hudon (Jul 02 2020 at 21:51):

Floris and I came in second

view this post on Zulip Simon Hudon (Jul 02 2020 at 21:52):

But still, Manuel and Peter left us in the dust

view this post on Zulip Simon Hudon (Jul 02 2020 at 21:54):

There's a section Results where we're listed:

Results
Proof Ground 2019 is over! The winning teams are:

  1. Manuel Eberl and Peter Lammich (Isabelle/HOL)
  2. Floris van Doorn and Simon Hudon (Lean)
  3. Fabian Kunze (Coq)

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:54):

doh, I'm blind

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:54):

Isabelle and Coq are extremely different, but there are fewer technical reasons why Coq couldn't compete with Lean on classical logic. It's also fine if Coq doesn't care for it.

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 21:56):

(OTOH, I thought suggesting that Lean could be better than Coq at classical logic was very unwelcome).

view this post on Zulip Simon Hudon (Jul 02 2020 at 21:57):

I think as interesting as the ranking is, it feels like it's more about people's mastery of their tool than how good the tool is. Floris and I are mathlib maintainers, Peter and Manuel are big Isabelle contributors as well and I feel like a lot of the people there might have suffered from the handicap of "not using the tool every waking moment of every day"

view this post on Zulip Simon Hudon (Jul 02 2020 at 21:59):

Paolo G. Giarrusso said:

(OTOH, I thought suggesting that Lean could be better than Coq at classical logic was very unwelcome).

Do you feel like that's what I was saying? My point was that the Lean culture and the Coq culture are very different. People in Coq are very reluctant to rely on axioms whereas we barely notice in Lean when the axiom of choice or excluded middle are involved in proof

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 22:00):

No no, sorry, that wasn't about you!

view this post on Zulip Simon Hudon (Jul 02 2020 at 22:02):

Oh, sorry. I'd like to know what you meant

view this post on Zulip Paolo Giarrusso (Jul 02 2020 at 22:19):

[I clarified my subtweet separately]

On "mastery of the tool", the full ranking seems indeed suggestive — there are Isabelle users both at the top and at the bottom, so it's not just the tool indeed
https://www21.in.tum.de/~wimmers/proofground/slides/results.html#/final-leaderboard

view this post on Zulip Cody Roux (Jul 04 2020 at 03:20):

Paolo G. Giarrusso said:

It's entirely fair to have a separate category for intuitionistic theorems; those are stronger. I'm not sure whether Coq _needs_ to be worse for classical proofs...

Not to belabor the point, or extend this discussion beyond its natural ending, but I feel that if you want to be classical, you need to embrace it like Isabelle or Lean, in the kernel, but mostly the library and the "hammer" tactics. Otherwise you're starting with a significant handicap. E.g. TAUT is Co-NP complete, whereas intuitionistic propositional logic is PSPACE complete. Also in practice, we're just better at ATP in classical settings.

One caveat would be that often, things are decidable, so you can act as if you had the excluded middle all along. ssreflect seems to adopt this philosophy, and that might be enough.

Honestly I hope some SSR hackers are going to participate in the next proof grounds.

view this post on Zulip Gaëtan Gilbert (Jul 04 2020 at 07:24):

What does Lean have for classical stuff in the kernel?


Last updated: Aug 14 2022 at 12:03 UTC