Stream: Miscellaneous

Topic: Automatic formalization


view this post on Zulip Bas Spitters (Nov 24 2022 at 10:13):

Interesting paper on automatic (lean) formalization of mathematics based on natural natural language processing.
I imagine this would be slightly harder for Coq, as the libraries (math-comp, corn, ...) have not been optimized to be human readable.
https://mathai2022.github.io/papers/20.pdf

view this post on Zulip Jason Rute (Nov 25 2022 at 10:28):

For those unfamiliar with this new trend, let me quickly summarize some of the recent work. Last May, Google Research along with researchers at Cambridge working on Isabelle AI figured out that one can use a large language model to translate theorem statements from informal to formal math.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:28):

Unlike a lot of machine learning work, this doesn't require training your own models. All one needs is (1) API access to OpenAI's Codex model (specifically the version code-davinci-002) which is what also powers Github copilot, and (2) about four or so examples of the sort of translation you are trying to do. One then feeds those examples to Codex along with the theorem statement you are trying to translate. The authors also showed that while the translation is only perfectly correct about 25% of the time or so, it is good enough as a source of extra training data for neural-theorem proving models which are trained with reinforcement learning.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:29):

While that work was in Isabelle, it is quite clear that this work could extend to any other ITP with at least a little bit of code on Github. A few Lean folks at CMU quickly picked up this work, replicating it for Lean (which is easy) and turned it into a chat bot, where not only can you ask it to translate a theorem statement, but you can also have a dialog with it on how to fix the translation. They turned this into a VS Code plugin called LeanChat that anyone can use and play with.

Autoformalization with LLMs in Lean... for everyone! The chat interface for autoformalizing theorem statements in Lean built by myself and @ewayers is now publicly available as a vs-code extension. https://marketplace.visualstudio.com/items?itemName=hoskinson-ml.lean-chat-vscode https://twitter.com/zhangir_azerbay/status/1554511702091350018/photo/1

- Zhangir Azerbayev (@zhangir_azerbay)

Autoformalization with LLMs in Lean! @zhangir_azerbay and Edward Ayers built a chat interface to formalize natural language mathematics in Lean: http://github.com/zhangir-azerbayev/lean-chat Very impressive work! https://twitter.com/Yuhu_ai_/status/1537514011427672064/photo/1

- Yuhuai (Tony) Wu (@Yuhu_ai_)

view this post on Zulip Jason Rute (Nov 25 2022 at 10:29):

While the plugin is specific to Lean, if you have API access to Codex, it would be easy to switch to Coq (or even a particular library of Coq like math-comp). The model is very tuned to the examples you give it along with the instructions, called a "prompt", and those examples and instructions can be found in prompting.lean.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:29):

After this work, the next natural question was "what about proofs?" I know of at least three very recent papers which address this:

view this post on Zulip Jason Rute (Nov 25 2022 at 10:29):

All of them are interesting and exciting work on the ability to use Codex (again out of the box) to translate to Isabelle and Lean. I find the Draft, Sketch, and Prove paper particularly exciting. It gives a method to not only do autoformalization, but full theorem proving. The idea is as follows.

  1. Use a math-centric large language model to give a natural language proof of a theorem. They experiment with Minerva (a private model of Google research) and Codex.
  2. Use Codex to translate that natural language proof to a proof sketch in Isabelle. Here a proof sketch is a proof where some of the low level details are not filled in. It also includes lots of comments.
  3. Use Isabelle's Sledgehammer to fill in the details of the proof sketch.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:29):

While I think their success rate of 43% on the mini-F2F benchmark might be helped by the fact that these problems are often found on the internet (in natural language form), I also think this is an interesting and compelling line of research.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:29):

Returning to Coq, I don't see a particular reason any of this can't be done in Coq. I do think the diversity of Coq libraries would present a challenge, but at the same time I think this could be overcome with good prompting. Also, as I said, since this doesn't require actually training models, all one needs is a Codex API key, some knowledge of "prompt-engineering", and a desire to play with this.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:31):

(While there is no perfect spot on the internet to discuss this type of research, a lot of discussion goes on in the "Machine Learning for Theorem Proving" stream of the Lean Zulip.)

view this post on Zulip Karl Palmskog (Nov 25 2022 at 10:40):

I think most practitioners would agree that the formalization of structures and theorem statements are what is really interesting and useful. But since the model here will not be able to see a difference between notations and other definitions in Coq different libraries, it most likely can't target MathComp specifically (or other libraries like Stdpp)

view this post on Zulip Karl Palmskog (Nov 25 2022 at 10:41):

(which means that code is not likely to be accepted by Coq out of the box)

view this post on Zulip Karl Palmskog (Nov 25 2022 at 10:43):

on the other hand, last I checked, Coq has by far the most code on GitHub of any proof assistant (millions and millions of lines)

view this post on Zulip Jason Rute (Nov 25 2022 at 10:50):

I do agree this type of auto-formalization is a bit different from the formalization most done in practice. I think currently this is mostly a toy so far. But it is getting constantly better, and half of that is better language models, while the other half is creative use of the current language models.

view this post on Zulip Jason Rute (Nov 25 2022 at 10:50):

But since the model here will not be able to see a difference between notations and other definitions in Coq different libraries, it most likely can't target MathComp specifically (or other libraries like Stdpp)

You could be right, but at the same time, I wouldn't discount the power of prompt engineering. If it is an established library like MathComp, you could add to the prompt "translate to Coq MathComp" and put in MathComp specific examples. But yes, it seems that one of the main problems of these models is that it doesn't know the proper translation of a math term into the target language. The more esoteric the library, the more likely this is.


Last updated: Mar 28 2024 at 14:01 UTC