Stream: Coq users

Topic: Questions on Roosterize IJCAR talk


view this post on Zulip Cyril Cohen (Jul 03 2020 at 13:38):

I watched the talk with some delay so could not comment nor questions, so I do it here.

First, I am impressed by the results and I wish we could use it in the near future maybe as an informative CI step.

@Karl Palmskog you seem to be the only co-author with a zulip account, would you mind answering or bringing your co-authors here?

view this post on Zulip Karl Palmskog (Jul 03 2020 at 13:40):

I can answer questions here, and I can tell the main machine learning co-author to get an account as well

view this post on Zulip Karl Palmskog (Jul 03 2020 at 13:40):

he is also speaking at the Coq workshop, so he might as well

view this post on Zulip Cyril Cohen (Jul 03 2020 at 13:44):

Karl Palmskog said:

I can answer questions here, and I can tell the main machine learning co-author to get an account as well

Sorry I posted the message by accident, I still had one sentence to complete in the first bullet, which I fixed now.

view this post on Zulip Karl Palmskog (Jul 03 2020 at 13:49):

Cyril Cohen said:

First, I am impressed by the results and I wish we could use it in the near future maybe as an informative CI step.

Thank you for the kind words.

On the one hand, yes, a too generic suggestion may indicate that the lemma itself is very generic. On the other hand, generic suggestions are a pervasive problem in transduction/translation systems in NLP. It is hard to make the system use "background knowledge" in the right context. So it could also be an artifact of how we designed our systems that some suggestions will be generic, in particular when one is applying the tool on some completely "new"/"different" lemma (as compared with training data). In our appendix, we show that performance improves a lot for suggesting names for Infotheo when we do some extra training on Infotheo lemmas (we started with only "MathComp core" training).

view this post on Zulip Cyril Cohen (Jul 03 2020 at 13:57):

Karl Palmskog said:

Cyril Cohen said:

First, I am impressed by the results and I wish we could use it in the near future maybe as an informative CI step.

Thank you for the kind words.

On the one hand, yes, a too generic suggestion may indicate that the lemma itself is very generic. On the other hand, generic suggestions are a pervasive problem in transduction/translation systems in NLP. It is hard to make the system use "background knowledge" in the right context. So it could also be an artifact of how we designed our systems that some suggestions will be generic, in particular when one is applying the tool on some completely "new"/"different" lemma (as compared with training data). In our appendix, we show that performance improves a lot for suggesting names for Infotheo when we do some extra training on Infotheo lemmas (we started with only "MathComp core" training).

Sorry, what I meant is name conflicts may indicate that the guessed name is unlikely to be correct... (and it's also the point of my second bullet)

view this post on Zulip Karl Palmskog (Jul 03 2020 at 14:02):

Cyril Cohen said:

Chopping heuristics are right now implemented as simple Python scripts operating on kernel term sexps, so implementing another chopping heuristic like the one you suggest should not be hard in itself (I would expect at most a working day). The time consuming part (human time and machine time) is to set up the evaluation to see if it actually improves performance (for some dataset) - unfortunately machine learning is to a large extent a black box that is hard to predict things about. I think it is an interesting direction to explore integrating information about name collisions in general, and I could see us looking at this in a potential journal version of the paper. However, it could turn out to be very time consuming (in terms of machine time) to "iterate" the chopping "online", I would have to check the specifics with Pengyu.

view this post on Zulip Karl Palmskog (Jul 03 2020 at 14:09):

Cyril Cohen said:

My understanding is that confidences/probabilities are already present in the "backend" of the tool, and could be displayed for any individual suggestion or for each entry in a list of top-X suggestions. My intuition is the same as yours, i.e., a name suggestion for a "new" theorem close to an existing theorem will have high confidence, but again there is a black machine learning box that could obfuscate this. For example, the so-called copy mechanism we use takes data directly from input rather than "querying" background knowledge.

view this post on Zulip Karl Palmskog (Jul 03 2020 at 14:25):

Cyril Cohen said:

Rejecting (or simply not displaying) name suggestions which have a confidence level below a certain threshold should be very easy to set up, and we could definitely modify our tool with a command line option to either completely suppress these low-confidence suggestions, or have them come with a disclaimer. One caveat is that a good threshold is likely to be dataset specific. So perhaps for models trained on only MathComp 1.10.0, it may be <= 0.6 that is most practical, etc.

view this post on Zulip Karl Palmskog (Jul 03 2020 at 14:36):

Cyril Cohen said:

It's straightforward to curate the training set in this way by excluding certain lemma names that the model will not learn much from (because they are "special" in some way, rather than "structural"). If you would be willing to curate the inclusion/exclusion of lemmas in a certain MathComp release for use in training Roosterize, this is definitely something we could collaborate on. However, it's always hard to say beforehand how much of an effect this kind of curation will result in (there are so many probabilities/vector operations involved). At best, it could bring a noteworthy BLEU increase, and at worst it could make no difference at all.

view this post on Zulip Cyril Cohen (Jul 03 2020 at 14:46):

Karl Palmskog said:

Cyril Cohen said:

Rejecting (or simply not displaying) name suggestions which have a confidence level below a certain threshold should be very easy to set up, and we could definitely modify our tool with a command line option to either completely suppress these low-confidence suggestions, or have them come with a disclaimer. One caveat is that a good threshold is likely to be dataset specific. So perhaps for models trained on only MathComp 1.10.0, it may be <= 0.6 that is most practical, etc.

Actually, I was thinking of if for CI purpose to evaluate PRs, and thus rejecting the names chosen by the user according to the distance to high confidence automatically generated names.

Actually I was thinking of annotations in the comment something like

#[roosterize(reference)]
Lemma naming_that_serves_as_a_reference : ...

#[roosterize(disruptive)]
Lemma custom_naming_that_does_not_follow_conventions_voluntarily : ...

view this post on Zulip Karl Palmskog (Jul 03 2020 at 14:51):

I see, so you're considering annotating lemmas in MathComp itself with naming style/quality. This is definitely something that we could use (the annotations would show up in our serialized data and we could use them to automatically exclude/include theorems in the training phase). However, maybe there should be a general "naming annotation" language for Coq that could also be used by mechanical linters? One might also consider using (some variant of) this for regular definitions.

view this post on Zulip Cyril Cohen (Jul 03 2020 at 14:55):

Karl Palmskog said:

I see, so you're considering annotating lemmas in MathComp itself with naming style/quality. This is definitely something that we could use (the annotations would show up in our serialized data and we could use them to automatically exclude/include theorems in the training phase). However, maybe there should be a general "naming annotation" language for Coq that could also be used by mechanical linters? One might also consider using (some variant of) this for regular definitions.

Yes I think naming is so important that annotating names could constitute a big improvement!

view this post on Zulip Karl Palmskog (Jul 03 2020 at 18:22):

for others who are seeing/following this topic, here is the full talk for the Roosterize paper on YouTube for background: https://youtu.be/Ysd-dcizw1A?list=PLl1dj5prwUJz6KV4Q1EPG9Gx0bjiKa_ey&t=3462


Last updated: Jan 31 2023 at 14:03 UTC