Automatically generated theorems · Machine learning and automation

Stream: Machine learning and automation

Topic: Automatically generated theorems

Patrick Nicodemus (Feb 05 2023 at 19:54):

Has anyone written a program to generate a stream of garbage Coq theorems as synthetic training data for machine learning, as well as Ltac scripts which prove them? It seems nontrivial to me because it's not obvious how to generate appropriate Ltac scripts to prove an arbitrary Gallina term.

Jason Rute (Feb 14 2023 at 14:15):

@Patrick Nicodemus My guess would be it is very hard to generate meaningful Coq theorems in a random way. The issue is that you are quickly going to start doing theorem proving. For example, if you want to apply a theorem foo which takes inputs n m : nat and h : m < n, then you have to choose n, m and h. Based on your selections for n and m, it may not be true that there is such an h since it doesn't hold that m < n, or maybe it is just difficult to prove m < n. Also, you likely need a theorem to begin with. You can't generate a lot of interesting proofs without theorems. For example, the proof eq_refl bool true is a proof for true = true. But it is also a proof for true = or true true. If you only generate the proof without the theorem, you are just going to get the first one I think. See discussion on exactly this topic (generating random Lean proofs) in the Lean Zulip: https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/Inferring.20Theorem.20from.20a.20Proof.20Term.3F

Jason Rute (Feb 14 2023 at 14:15):

What people have had success with is training a machine learning model to generate synthetic theorems and proofs. Here are some papers:

(Metamath) Learning to Prove Theorems by Learning to Generate Theorems
(FOL) Learning to prove from synthetic theorems
(Lean) Synthetic Proof Term Data Augmentation for Theorem Proving with Language Models:
- paper
- slides

The last paper is most relevant to Coq I think. Of course the challenge is getting the data to train such a model.

Jason Rute (Feb 14 2023 at 14:15):

There also is a lot of work on just augmenting a dataset with synthetic theorems of a particular form, like inequality problems which are easy to generate. For this, you don't need a trained model. Just a generator which starts from axioms and generates theorems. These three papers do this with inequality theorems:

Last updated: Apr 20 2024 at 00:02 UTC