Hi. I'm currently developing a Coq formatter and facing challenges in properly formatting TacAlias. For instance, the id for symmetry in A
is symmetry_in_#_52192774
, where a space is needed between symmetry
and in
. On the other hand, the id for info_auto
is info_auto_#_#_#_521927D7
, where an underscore is needed between info
and auto
. I'm unsure how to distinguish between these cases.
The coq-core documentation suggests the existence of printing rules, but it seems there is no apparent way to retrieve pp_tactic
. I'm wondering if there's an alternative method to differentiate between the mentioned cases or if pp_tactic
is indeed necessary. Any guidance on this would be appreciated.
Hi @Hiroki Tokunaga , that's indeed a very bad problem, so far people have been doing some workarounds, by usually just using some hack. I cc @Pierre-Marie Pédrot who may be able to help with this.
These names are internal unique identifiers, they shouldn't be used for printing purposes. What are you trying to achieve exactly?
@Pierre-Marie Pédrot these are the names in the Ast
@Hiroki Tokunaga is trying to write an Ast-based formatter tool
But what's the expected output? A machine-readable representation of the AST or a human-readable output?
A formatter can work using both possibilities, so I'm unsure what's the approach Hiroki-san is using here
If it's the latter, the Pptactic.pr_alias function should be used.
(or some yet-to-be exposed lower-level formatter)
if I remember correctly, in our small prototype neuroformatter, the input was a stream of Coq tokens (as given by Coq's lexer at each point in the source), and the output was the predicted white space characters between each pair of adjacent tokens
Yup @Karl Palmskog , tokens from the lexer doesn't have this problem, however using the Ast in Coq is quite a challenge still!
my intuition is that one could probably get a near-perfect AI formatter if it's given access to proof state - it's not obvious the AST would help. At least the deficiencies we saw in predictions were seemingly entirely due to lack of proof state observation (e.g., token indent based on goal structure)
but if it's going to be done deterministically, sure, some formatting rules could likely benefit from AST
Why you didn't dump the proof state in your experiments?
I guess the formatter here is more in the lines of ocamlformat
, which indeed is Ast based, but the Ast of Coq remains likely the higher barrier to entry for anyone wanting to do stuff, now symmetry
results in
((control ()) (attrs ())
(expr
(VernacSynterp
(VernacExtend (VernacSolve 0)
((GenArg (Rawwit (OptArg (ExtraArg ltac_selector))) ())
(GenArg (Rawwit (OptArg (ExtraArg ltac_info))) ())
(GenArg (Rawwit (ExtraArg tactic))
(TacAlias
((KerName (MPfile (DirPath ((Id Tactics) (Id Init) (Id Coq))))
(Id now_#_62200C10))
((Tacexp
(TacArg (TacCall ((Ser_Qualid (DirPath ()) (Id symmetry)) ()))))))))
(GenArg (Rawwit (ExtraArg ltac_use_default)) false))))))))))
which I can't parse
the reason that work didn't go further was pretty simple, we ran out of funding
yes, I think dumping + incorporating proof states into the model would not have been a lot of engineering
right, the Coq AST is so far removed from the actual concrete syntax
Actually Coq's Ast is kinda fine, the extension mechanism and stuff done for Ltac is where things start to go hellish.
Something as basic as Ltac, which virtually every Coq development uses, you need to write a few thousand lines of code to be able to handle it in a principled way.
I think Ltac2 is almost the same in this respect, right? And also SSReflect
Ltac2 may be better, ssreflect hooks on ltac1 so it shares the same problems
Tho LTAC2 introduces yet another syntax extension method, so if you want to handle Ltac2 programs, you need to write all the plugin infra to handle their custom ad-hoc method.
So indeed, seems like a big PITA to handle.
seems like LLMs can also do Gallina formatting quite well with some simple prompt engineering ("only change whitespace"). So indeed Ltac/Ltac2/SSR proof formatting is the big problem in a Coq formatter.
Maybe they would work fine? There is no problem in letting the LLM interact with the proof info (in fact coq-lsp / Flèche is designed for that)
Last updated: Oct 13 2024 at 01:02 UTC