Hi ! Is there any plan to update / fix `coqc -beautify`

at the moment ?

I'm still trying to find a way to process input commands (not outputs) to strip them from user notation. The goal is to be able to process coq user development without depending on the notation people might use, only on the original definitions.

At the moment, the only way I know seems to be `-beautify`

which seems a bit broken (missing new lines, errors, ...)

Just wondering if there are other interested by this, and if anything is currently on-going ?

we have experimental work in this direction that uses a completely different approach, which we will present at the Coq Workshop next month: https://cozy.ece.utexas.edu/~palmskog/coqws20draft.pdf

You can see an example here of how it currently performs after only observing raw token streams: https://gist.github.com/palmskog/1384a0bcceb6c55d3f1126422d6fbb7f

`-beautify`

has been relatively unmaintained lately but if you have specific needs you can bug Hugo about it.

currently, for the experimental work, we do actually process notations (preserve tokens), but there is nothing inherent in the approach that makes it limited to this.

if the only goal is to get out input without notation (i.e., no beautification/formatting), that is probably best handled through something like SerAPI

I made an attempt at implementing that in SerAPI, but it required much more knowledge of Coq's internal than I have :) I'll have a look at your approach, thanks ! The main idea comes from the heterogeneous state of Coq development in the wild: everyone has different notations for similar concept, so training learning models in this context is quite tricky, if we need to take each notations into account. Getting rid of them would allow us to process multiple sources without their specific notations.

@Karl Palmskog are these workshop recorded somehow ?

@Vincent Siles I certainly hope they will be recorded since it's all online, but hopefully @Théo Zimmermann can confirm

I certainly hope so too but I haven't had any clear information confirming this yet.

Note that the workshop will be free to attend and registration is already open on the IJCAR-FSCD 2020 website.

@Vincent Siles so basically you want a sort of normal form at the Gallina code level? I don't think you want to process elaborated terms, right? For the record, we released our Coq dataset we used for machine learning recently, which has tokens+AST(with notations)+terms(for lemma statements only), all as sexps as generated by SerAPI: https://github.com/EngineeringSoftware/math-comp-corpus

Yes something like that: not sure what elaborated terms is but we want to process user level language (so the ML model learn thing we can reinject into Coq) without the complexity of users notations. Simple example would be `plus`

which is an alias to `Nat.add`

. It might be called `addition`

somewhere else, and this brings noise.

elaborated terms are what Emilio means when he talks about terms being in "kernel form" here: https://github.com/ejgallego/coq-serapi/blob/v8.11/FAQ.md#interpretation-and-terms -- what is processed by Coq's type-checker/kernel

Thank you for pointing this out. Then it looks like we are looking for a glob_constr, or an intermediate between constr_expr and glob_constr (like Notation interpretation, maybe implicits / desugar, not sure)

Last updated: Feb 01 2023 at 12:30 UTC