Stream: Coq users

Topic: Distinguishing `TacAlias` ids


view this post on Zulip Hiroki Tokunaga (Dec 05 2023 at 05:25):

Hi. I'm currently developing a Coq formatter and facing challenges in properly formatting TacAlias. For instance, the id for symmetry in A is symmetry_in_#_52192774, where a space is needed between symmetry and in. On the other hand, the id for info_auto is info_auto_#_#_#_521927D7, where an underscore is needed between info and auto. I'm unsure how to distinguish between these cases.

The coq-core documentation suggests the existence of printing rules, but it seems there is no apparent way to retrieve pp_tactic. I'm wondering if there's an alternative method to differentiate between the mentioned cases or if pp_tactic is indeed necessary. Any guidance on this would be appreciated.

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 17:54):

Hi @Hiroki Tokunaga , that's indeed a very bad problem, so far people have been doing some workarounds, by usually just using some hack. I cc @Pierre-Marie Pédrot who may be able to help with this.

view this post on Zulip Pierre-Marie Pédrot (Dec 05 2023 at 19:43):

These names are internal unique identifiers, they shouldn't be used for printing purposes. What are you trying to achieve exactly?

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 19:44):

@Pierre-Marie Pédrot these are the names in the Ast

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 19:45):

@Hiroki Tokunaga is trying to write an Ast-based formatter tool

view this post on Zulip Pierre-Marie Pédrot (Dec 05 2023 at 19:46):

But what's the expected output? A machine-readable representation of the AST or a human-readable output?

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 19:47):

A formatter can work using both possibilities, so I'm unsure what's the approach Hiroki-san is using here

view this post on Zulip Pierre-Marie Pédrot (Dec 05 2023 at 19:48):

If it's the latter, the Pptactic.pr_alias function should be used.

view this post on Zulip Pierre-Marie Pédrot (Dec 05 2023 at 19:48):

(or some yet-to-be exposed lower-level formatter)

view this post on Zulip Karl Palmskog (Dec 05 2023 at 20:47):

if I remember correctly, in our small prototype neuroformatter, the input was a stream of Coq tokens (as given by Coq's lexer at each point in the source), and the output was the predicted white space characters between each pair of adjacent tokens

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 20:50):

Yup @Karl Palmskog , tokens from the lexer doesn't have this problem, however using the Ast in Coq is quite a challenge still!

view this post on Zulip Karl Palmskog (Dec 05 2023 at 20:51):

my intuition is that one could probably get a near-perfect AI formatter if it's given access to proof state - it's not obvious the AST would help. At least the deficiencies we saw in predictions were seemingly entirely due to lack of proof state observation (e.g., token indent based on goal structure)

view this post on Zulip Karl Palmskog (Dec 05 2023 at 20:53):

but if it's going to be done deterministically, sure, some formatting rules could likely benefit from AST

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 20:54):

Why you didn't dump the proof state in your experiments?

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 20:55):

I guess the formatter here is more in the lines of ocamlformat, which indeed is Ast based, but the Ast of Coq remains likely the higher barrier to entry for anyone wanting to do stuff, now symmetry results in

((control ()) (attrs ())
     (expr
      (VernacSynterp
       (VernacExtend (VernacSolve 0)
        ((GenArg (Rawwit (OptArg (ExtraArg ltac_selector))) ())
         (GenArg (Rawwit (OptArg (ExtraArg ltac_info))) ())
         (GenArg (Rawwit (ExtraArg tactic))
          (TacAlias
           ((KerName (MPfile (DirPath ((Id Tactics) (Id Init) (Id Coq))))
             (Id now_#_62200C10))
            ((Tacexp
              (TacArg (TacCall ((Ser_Qualid (DirPath ()) (Id symmetry)) ()))))))))
         (GenArg (Rawwit (ExtraArg ltac_use_default)) false))))))))))

which I can't parse

view this post on Zulip Karl Palmskog (Dec 05 2023 at 20:56):

the reason that work didn't go further was pretty simple, we ran out of funding

view this post on Zulip Karl Palmskog (Dec 05 2023 at 20:57):

yes, I think dumping + incorporating proof states into the model would not have been a lot of engineering

view this post on Zulip Karl Palmskog (Dec 05 2023 at 20:58):

right, the Coq AST is so far removed from the actual concrete syntax

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 21:00):

Actually Coq's Ast is kinda fine, the extension mechanism and stuff done for Ltac is where things start to go hellish.

Something as basic as Ltac, which virtually every Coq development uses, you need to write a few thousand lines of code to be able to handle it in a principled way.

view this post on Zulip Karl Palmskog (Dec 05 2023 at 21:03):

I think Ltac2 is almost the same in this respect, right? And also SSReflect

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 21:09):

Ltac2 may be better, ssreflect hooks on ltac1 so it shares the same problems

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 21:11):

Tho LTAC2 introduces yet another syntax extension method, so if you want to handle Ltac2 programs, you need to write all the plugin infra to handle their custom ad-hoc method.

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 21:11):

So indeed, seems like a big PITA to handle.

view this post on Zulip Karl Palmskog (Dec 05 2023 at 21:21):

seems like LLMs can also do Gallina formatting quite well with some simple prompt engineering ("only change whitespace"). So indeed Ltac/Ltac2/SSR proof formatting is the big problem in a Coq formatter.

view this post on Zulip Emilio Jesús Gallego Arias (Dec 05 2023 at 23:24):

Maybe they would work fine? There is no problem in letting the LLM interact with the proof info (in fact coq-lsp / Flèche is designed for that)


Last updated: Jun 23 2024 at 05:02 UTC