Parsing problem · Coq users · Zulip Chat Archive

Hi all :smile:. I have a problem related to parsing /notation. A minimal example reproducing the issue is below. I need to implement four forms of notation: l ↦ v, l ↦□ v, l ↦{q} v and l ↦{# q} v where the latter three notations are implemented using a custom entry. The last three works in isolation, but they break when I add the first notation as well (see the comments in the example).

I think the problem may be caused by some special handling of notation of the form { x } in Coq? If I change the notation "{ dq }" into e.g., "{^ dq }"} or "[{ dq }]"} then it works. The documentation (see the third note here) seems to confirm that { x } notation is special, but I'm not sure if that's what's causing the problem and/or what a solution would be?

Require Import Numbers.BinNums.

Inductive dfrac := DfracOwn : nat -> dfrac | DfracDiscarded : dfrac.

Declare Custom Entry dfrac.
Notation "{ dq }" := (dq) (in custom dfrac at level 1, dq constr).
Notation "□" := DfracDiscarded (in custom dfrac).
Notation "{# q }" := (DfracOwn q) (in custom dfrac at level 1, q constr).

Definition mapsto (l : nat) (dq : dfrac) (v : nat) : Prop := False.

Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)

Check (10 ↦ 10).                  (* Works. *)
Check (10 ↦□ 10).                 (* Works. *)
Check (10 ↦{# 1 + 2} 10).         (* Works *)
Check (exists dq, 10 ↦{dq} 10).   (* Error: Unknown interpretation for notation "{ _ }". *)

Tagging @Hugo Herbelin as I was told you might be able to figure out what's going on :smile:

Paolo Giarrusso (Nov 30 2020 at 13:28):

Might not be the issue, but for this to have a chance of working, I think you need Coq to left-factor the two mapsto notations, which needs Coq to guess the same levels for l (and v?). You probably want to set those levels explicitly.

Paolo Giarrusso (Nov 30 2020 at 13:30):

(mostly because the heuristics producing those levels are not documented, and sometimes they're surprising)

Paolo Giarrusso (Nov 30 2020 at 13:31):

Simon Friis Vindum (Nov 30 2020 at 13:47):

Thank for answering @Paolo Giarrusso. What you're talking about is not theat level 20 which is already there? Also, just to mention it, I have tried a lot of fiddling with the levels to no success.

Paolo Giarrusso (Nov 30 2020 at 13:54):

Paolo Giarrusso (Nov 30 2020 at 13:55):

your results _suggest_ Coq's guessing the same M and N for both notations, maybe?

Fabian Kunze (Nov 30 2020 at 13:55):

Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)
Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).

Paolo Giarrusso (Nov 30 2020 at 13:56):

Fabian Kunze (Nov 30 2020 at 13:56):

(Nothing else changed from your pasted code)
I would expect that "v" is now not allowed to "start" with '{' in a concrete parse.

Fabian Kunze (Nov 30 2020 at 13:57):

Maybe the parser has to decide if it uses the first or second rule based on the symbols it sees? And because '{' alredy exists in the constr-language, it thinks "well, this should be a constr"?

Fabian Kunze (Nov 30 2020 at 13:59):

the whole handling of '{' seems very hacky, so I tried the first hacky idea I cam up with ;)

Fabian Kunze (Nov 30 2020 at 14:00):

Require Import Numbers.BinNums.

Inductive dfrac := DfracOwn : nat -> dfrac | DfracDiscarded : dfrac.

Require List. Import List.ListNotations.
Declare Custom Entry dfrac.
Notation "[ dq ]" := (dq) (in custom dfrac at level 1, dq constr).
Notation "□" := DfracDiscarded (in custom dfrac).
Notation "{# q }" := (DfracOwn q) (in custom dfrac at level 1, q constr).

Definition mapsto (l : nat) (dq : dfrac) (v : nat) : Prop := False.

Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)

Check (10 ↦ 10).                  (* Works. *)
Check (10 ↦□ 10).                 (* Works. *)
Check (10 ↦{# 1 + 2} 10).         (* Works *)
Check (exists dq, 10 ↦[dq] 10).   (* Error: Unknown interpretation for notation "[ _ ]". *)

Fabian Kunze (Nov 30 2020 at 14:00):

Paolo Giarrusso (Nov 30 2020 at 14:01):

Fabian Kunze (Nov 30 2020 at 14:01):

Paolo Giarrusso (Nov 30 2020 at 14:01):

but I see! These are LL(1) recursive-descent parsers, not full CFG parsers, so of course rules are ordered — you only have left-biased choice, not general choice.

Paolo Giarrusso (Nov 30 2020 at 14:03):

and if you want later rules to have priorities (which is... not unreasonable?), they should be added to the _left_ of existing ones (which would explain this behavior). But I'm not sure I've ever seen this happen.

Paolo Giarrusso (Nov 30 2020 at 14:03):

Fabian Kunze (Nov 30 2020 at 14:04):

Paolo Giarrusso (Nov 30 2020 at 14:04):

and if you think "that's a different rule and it can't matter, parsers are compositional" I'm going to go "oh sweet summer child"

Paolo Giarrusso (Nov 30 2020 at 14:04):

Fabian Kunze (Nov 30 2020 at 14:04):

Paolo Giarrusso (Nov 30 2020 at 14:05):

if it were '{#', that'd be clearer, but I never got the semantics of {#, and IIRC it's observably not the same.

Fabian Kunze (Nov 30 2020 at 14:05):

thats the reason why sometimes, new notation breaks old stuff in horrible ways. e.g. adding a notation with [| breaks all ltac-destruct patterns destruct n as [|].

Fabian Kunze (Nov 30 2020 at 14:06):

Paolo Giarrusso (Nov 30 2020 at 14:06):

Paolo Giarrusso (Nov 30 2020 at 14:07):

Fabian Kunze (Nov 30 2020 at 14:07):

the former is one token and the later two I think. '' always forces a own token afaik

Paolo Giarrusso (Nov 30 2020 at 14:07):

Fabian Kunze (Nov 30 2020 at 14:08):

While we are at it: if one wants to use two tokens but print them nicely, one can of course split the tokens and declare that printing should omit the space.
This here makes nil parse as [| |], but does not screw up other notations:
Notation "[ | | ]" := (nil _) (format "[ | | ]"): vector_scope.

Fabian Kunze (Nov 30 2020 at 14:09):

Paolo Giarrusso (Nov 30 2020 at 14:09):

I've done that to Notation "[ | P | ]" := (only_provable P) (format "[ | P | ]").

Paolo Giarrusso (Nov 30 2020 at 14:12):

Simon Friis Vindum (Nov 30 2020 at 14:12):

Simon Friis Vindum (Nov 30 2020 at 14:16):

To me it seems kinda backwards. We want Coq to _first_ try parsing with the l ↦ dq v rule and therefore we move it _last_? :thinking:

Fabian Kunze (Nov 30 2020 at 14:16):

Paolo Giarrusso (Nov 30 2020 at 14:20):

That's exactly how shadowing works and should work, e.g. let x = foo in let x = bar in x (*bar*)

Paolo Giarrusso (Nov 30 2020 at 14:21):

I doubt notations are 100% lexically scoped, but they're also made available by Import foo.

Paolo Giarrusso (Nov 30 2020 at 14:22):

However: I'd recommend Print Grammar constr. (and maybe Print Grammar dfrac.) to look at parse tables to confirm.

Paolo Giarrusso (Nov 30 2020 at 14:23):

The ~~parse tables~~ grammars are somewhat readable — that is, they're BNF-like, not like the output of LR parser generators.

Simon Friis Vindum (Nov 30 2020 at 14:24):

It makes sense when you look at it like that. I just thought of it as a list of parsing rules we're I'd expect the first to be tried first.

Paolo Giarrusso (Nov 30 2020 at 14:25):

Well, please don't take my word for it — I just guessed based on this example.

Simon Friis Vindum (Nov 30 2020 at 14:26):

Simon Friis Vindum (Nov 30 2020 at 14:59):

Ok, it turns out that the solution leads to another problem: Now Coq _prints_ with the l ↦ dq v and never with the l ↦ v notation. I.e., if you write l ↦ v then Coq prints it back as l ↦{#1} v which is not good.

Pierre-Marie Pédrot (Nov 30 2020 at 15:01):

Pierre-Marie Pédrot (Nov 30 2020 at 15:02):

Simon Friis Vindum (Nov 30 2020 at 15:08):

That looks like a great change. It would be nice with a solution that worked with the current version of Coq though.

Fabian Kunze (Nov 30 2020 at 15:54):

In your real use case, do you use nesting inside the dp so that {} does sometimes not appear at toplevel? or could one combine the '{" with the ↦?

Hugo Herbelin (Nov 30 2020 at 20:10):

Yes, it is related to "{ _ }" being a declared notation without interpretation, but even if "{ _ }" had an interpretation, the notation "l ↦ v" would introduce an overlapping between an applied constr {dp} 10 and a dfrac {dp} followed by a constr 10. Would you use say "[ _ ]" and [ _ ] being used for something else in constr, would you have a problem too.

To avoid the overlapping, I believe that, instead of a notation "l ↦ v", what you would like is to declare:

Notation "" := (DfracOwn 1) (in custom dfrac).

This would be a way to factor "l ↦ dq v" and "l ↦ v", as indeed suggested by @Paolo Giarrusso .

Paolo Giarrusso (Nov 30 2020 at 20:13):

thanks, but I was thinking of the automatic factoring... Your notation would be (inferred as?) only-printing right? and it'll only be used for ↦ thanks to the custom entry right?

Simon Friis Vindum (Dec 01 2020 at 10:18):

@Fabian Kunze Yes, we could do that. The reason for using the custom entry is that this part of the notation can then be reused for other similar arrows as well.

Hugo Herbelin (Dec 01 2020 at 11:44):

Two more remarks, but first my apologies for not having seen the whole discussion when I replied yesterday (I only saw the two first messages).

They are the same. Surrounding single quotes have an effect only on tokens starting with a letter, so as to distinguish them from variables, otherwise they are dropped. It is spaces which tell if a sequence of symbols is one or more tokens.

There is also an internal factorization made by Camlp5. If Print Grammar shows:

| "20" RIGHTA
  [ SELF; "↦"; NEXT
  | SELF; "↦"; constr:dfrac LEVEL "1"; NEXT ]

| "20" RIGHTA
  [ SELF; "↦"; [ NEXT | constr:dfrac LEVEL "1"; NEXT ] ]

For more details, see in Coq archive the factorizing function Grammar.try_insertthat is used as declaration time, as well as the unfactorizing function Grammar.flatten_tree used at printing time.

The current parsing architecture is a standard two-step lexing then parsing approach. In particular, tokens, such as [|, are declared globally and new declarations of tokens can break parsing code expecting that this new token is instead cut into pieces. This is a serious problem with (at least) two possible solutions: 1) fuse lexing and parsing and declare tokens only locally to the rules that need them 2) only declare symbolic tokens made of one symbol and let the parser detect itself when two symbols should be without space inbetween them. I'm currently leaning towards option 1, which has the advantage to work also with "local" keywords. No precise plans yet though, it is more at the stage of discussion.

Paolo Giarrusso (Dec 01 2020 at 16:45):

@Hugo Herbelin scannerless parsing is a known research topic with several usable implementations, especially for extensible parsing, but works using that need extensions to usual parsing techniques — IIRC, you end up requiring some extensions to CFGs for specific features. I think the work on Spoofax (and the underlying technologies) by Eelco Visser and others is pretty relevant.

Jules Jacobs (Dec 01 2020 at 17:26):

@Paolo Giarrusso
I believe Spoofax has 3 extensions beyond CFGs: (1) you can specify that a specific set of characters cannot follow a production, for lexical disambiguation (2) precedence rules for operators (3) special support for whitespace/indentation sensitive parsing. IIRC Eelco Visser has previously expressed interest in extensible syntax and improving on Coq's and Agda's notation mechanisms.

Hugo Herbelin (Dec 01 2020 at 18:01):

I must say that I know basically nothing about scannerless. Does it require recompiling the grammar after each extension? Would it support the LIST0, LIST1 and OPT combinators of campl5?

Is there some available material about Eelco Visser's interest for Coq's notations?

Jules Jacobs (Dec 01 2020 at 19:57):

I don't know of any public information, but I think I recall that he mentioned interest in dynamic grammar extensions like Agda and Coq in the reading group when we read "Parsing Mixfix Operators" (http://pl.ewi.tudelft.nl/readinggroup/). Spoofax does support lists (with * and +) and optional (with ?). Spoofax does require compiling the grammar (generating parse tables). Adding dynamic grammar extensions to Spoofax is probably not easy.

Paolo Giarrusso (Dec 01 2020 at 20:58):

Did he mention interest? I tried to convince him and Robbert to look into it, but I wasn't sure how successful that was over Slack.

Paolo Giarrusso (Dec 01 2020 at 21:02):

incremental grammar compilation has been an annoyance for quite a while, I remember Sebastian Erdweg suffered that back in 2010-2012 when he was using Spoofax to build SugarJ & relatives (which addressed how to use those technologies to allow in-language extensibility).

Paolo Giarrusso (Dec 01 2020 at 21:05):

On that front, I wonder if it’s a matter of engineering incremental table construction for Spoofax’s scannerless GLR parsing, or whether you’d need scannerless GLL — I suspect it’s more incremental (like LL parsers) but I’m sure it’s much more experimental, and less well understood.

Stream: Coq users

Topic: Parsing problem

Simon Friis Vindum (Nov 30 2020 at 12:26):

Paolo Giarrusso (Nov 30 2020 at 13:28):

Paolo Giarrusso (Nov 30 2020 at 13:30):

Paolo Giarrusso (Nov 30 2020 at 13:31):

Simon Friis Vindum (Nov 30 2020 at 13:47):

Paolo Giarrusso (Nov 30 2020 at 13:54):

Paolo Giarrusso (Nov 30 2020 at 13:55):

Fabian Kunze (Nov 30 2020 at 13:55):

Paolo Giarrusso (Nov 30 2020 at 13:56):

Paolo Giarrusso (Nov 30 2020 at 13:56):

Fabian Kunze (Nov 30 2020 at 13:56):

Fabian Kunze (Nov 30 2020 at 13:57):

Fabian Kunze (Nov 30 2020 at 13:59):

Fabian Kunze (Nov 30 2020 at 14:00):

Fabian Kunze (Nov 30 2020 at 14:00):

Paolo Giarrusso (Nov 30 2020 at 14:01):

Fabian Kunze (Nov 30 2020 at 14:01):

Fabian Kunze (Nov 30 2020 at 14:01):

Paolo Giarrusso (Nov 30 2020 at 14:01):

Paolo Giarrusso (Nov 30 2020 at 14:03):

Paolo Giarrusso (Nov 30 2020 at 14:03):

Fabian Kunze (Nov 30 2020 at 14:04):

Paolo Giarrusso (Nov 30 2020 at 14:04):

Paolo Giarrusso (Nov 30 2020 at 14:04):

Fabian Kunze (Nov 30 2020 at 14:04):

Paolo Giarrusso (Nov 30 2020 at 14:05):

Fabian Kunze (Nov 30 2020 at 14:05):

Fabian Kunze (Nov 30 2020 at 14:06):

Paolo Giarrusso (Nov 30 2020 at 14:06):

Paolo Giarrusso (Nov 30 2020 at 14:07):

Fabian Kunze (Nov 30 2020 at 14:07):

Paolo Giarrusso (Nov 30 2020 at 14:07):

Fabian Kunze (Nov 30 2020 at 14:08):

Fabian Kunze (Nov 30 2020 at 14:09):

Paolo Giarrusso (Nov 30 2020 at 14:09):

Paolo Giarrusso (Nov 30 2020 at 14:12):

Simon Friis Vindum (Nov 30 2020 at 14:12):

Simon Friis Vindum (Nov 30 2020 at 14:16):

Fabian Kunze (Nov 30 2020 at 14:16):

Paolo Giarrusso (Nov 30 2020 at 14:20):

Paolo Giarrusso (Nov 30 2020 at 14:21):

Paolo Giarrusso (Nov 30 2020 at 14:22):

Paolo Giarrusso (Nov 30 2020 at 14:23):

Simon Friis Vindum (Nov 30 2020 at 14:24):

Paolo Giarrusso (Nov 30 2020 at 14:25):

Simon Friis Vindum (Nov 30 2020 at 14:26):

Simon Friis Vindum (Nov 30 2020 at 14:59):

Pierre-Marie Pédrot (Nov 30 2020 at 15:01):

Pierre-Marie Pédrot (Nov 30 2020 at 15:02):

Simon Friis Vindum (Nov 30 2020 at 15:08):

Fabian Kunze (Nov 30 2020 at 15:54):

Hugo Herbelin (Nov 30 2020 at 20:10):

Paolo Giarrusso (Nov 30 2020 at 20:13):

Simon Friis Vindum (Dec 01 2020 at 10:18):

Hugo Herbelin (Dec 01 2020 at 11:44):

Paolo Giarrusso (Dec 01 2020 at 16:45):

Jules Jacobs (Dec 01 2020 at 17:26):

Hugo Herbelin (Dec 01 2020 at 18:01):

Jules Jacobs (Dec 01 2020 at 19:57):

Paolo Giarrusso (Dec 01 2020 at 20:58):

Paolo Giarrusso (Dec 01 2020 at 21:02):

Paolo Giarrusso (Dec 01 2020 at 21:05):

Paolo Giarrusso (Dec 01 2020 at 21:06):

Paolo Giarrusso (Dec 01 2020 at 21:11):