Stream: Coq users

Topic: Parsing problem


view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 12:26):

Hi all :smile:. I have a problem related to parsing /notation. A minimal example reproducing the issue is below. I need to implement four forms of notation: l ↦ v, l ↦□ v, l ↦{q} v and l ↦{# q} v where the latter three notations are implemented using a custom entry. The last three works in isolation, but they break when I add the first notation as well (see the comments in the example).

I think the problem may be caused by some special handling of notation of the form { x } in Coq? If I change the notation "{ dq }" into e.g., "{^ dq }"} or "[{ dq }]"} then it works. The documentation (see the third note here) seems to confirm that { x } notation is special, but I'm not sure if that's what's causing the problem and/or what a solution would be?

Require Import Numbers.BinNums.

Inductive dfrac := DfracOwn : nat -> dfrac | DfracDiscarded : dfrac.

Declare Custom Entry dfrac.
Notation "{ dq }" := (dq) (in custom dfrac at level 1, dq constr).
Notation "□" := DfracDiscarded (in custom dfrac).
Notation "{# q }" := (DfracOwn q) (in custom dfrac at level 1, q constr).

Definition mapsto (l : nat) (dq : dfrac) (v : nat) : Prop := False.

Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)

Check (10 ↦ 10).                  (* Works. *)
Check (10 ↦□ 10).                 (* Works. *)
Check (10 ↦{# 1 + 2} 10).         (* Works *)
Check (exists dq, 10 ↦{dq} 10).   (* Error: Unknown interpretation for notation "{ _ }". *)

Tagging @Hugo Herbelin as I was told you might be able to figure out what's going on :smile:

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:28):

Might not be the issue, but for this to have a chance of working, I think you need Coq to left-factor the two mapsto notations, which needs Coq to guess the same levels for l (and v?). You probably want to set those levels explicitly.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:30):

(mostly because the heuristics producing those levels are not documented, and sometimes they're surprising)

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:31):

(Iris has some internal docs on this, and pointers to official docs, at https://gitlab.mpi-sws.org/iris/iris/-/blob/master/docs/proof_guide.md#notations)

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 13:47):

Thank for answering @Paolo Giarrusso. What you're talking about is not theat level 20 which is already there? Also, just to mention it, I have tried a lot of fiddling with the levels to no success.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:54):

next to at level 20 there are inferred l at level M, v at level N, ...

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:55):

your results _suggest_ Coq's guessing the same M and N for both notations, maybe?

view this post on Zulip Fabian Kunze (Nov 30 2020 at 13:55):

It works for me if you just swap the order of the two notations

Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)
Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:56):

O_O

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 13:56):

@Fabian Kunze does that make sense to you? can you explain it?

view this post on Zulip Fabian Kunze (Nov 30 2020 at 13:56):

(Nothing else changed from your pasted code)
I would expect that "v" is now not allowed to "start" with '{' in a concrete parse.

view this post on Zulip Fabian Kunze (Nov 30 2020 at 13:57):

Maybe the parser has to decide if it uses the first or second rule based on the symbols it sees? And because '{' alredy exists in the constr-language, it thinks "well, this should be a constr"?

view this post on Zulip Fabian Kunze (Nov 30 2020 at 13:59):

the whole handling of '{' seems very hacky, so I tried the first hacky idea I cam up with ;)

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:00):

Ah, no, it has nothign to do with the specialal '{':

Require Import Numbers.BinNums.

Inductive dfrac := DfracOwn : nat -> dfrac | DfracDiscarded : dfrac.

Require List. Import List.ListNotations.
Declare Custom Entry dfrac.
Notation "[ dq ]" := (dq) (in custom dfrac at level 1, dq constr).
Notation "□" := DfracDiscarded (in custom dfrac).
Notation "{# q }" := (DfracOwn q) (in custom dfrac at level 1, q constr).

Definition mapsto (l : nat) (dq : dfrac) (v : nat) : Prop := False.

Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)

Check (10  10).                  (* Works. *)
Check (10 ↦□ 10).                 (* Works. *)
Check (10 {# 1 + 2} 10).         (* Works *)
Check (exists dq, 10 [dq] 10).   (* Error: Unknown interpretation for notation "[ _ ]". *)

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:00):

using "[" leads to the same behaviour

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:01):

now the error uses [ right?

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:01):

yeah

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:01):

working in one order, not working in the other

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:01):

but I see! These are LL(1) recursive-descent parsers, not full CFG parsers, so of course rules are ordered — you only have left-biased choice, not general choice.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:03):

and if you want later rules to have priorities (which is... not unreasonable?), they should be added to the _left_ of existing ones (which would explain this behavior). But I'm not sure I've ever seen this happen.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:03):

@Fabian Kunze also {# still uses {.

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:04):

two letter tokens and one-letter-tokens are distinguished before parsing

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:04):

and if you think "that's a different rule and it can't matter, parsers are compositional" I'm going to go "oh sweet summer child"

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:04):

but is {# one token?

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:04):

yes

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:05):

if it were '{#', that'd be clearer, but I never got the semantics of {#, and IIRC it's observably not the same.

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:05):

thats the reason why sometimes, new notation breaks old stuff in horrible ways. e.g. adding a notation with [| breaks all ltac-destruct patterns destruct n as [|].

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:06):

the default is that consequent special characters are one token.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:06):

so why is (⊢@{ different from ('⊢@{' ?

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:07):

ah, maybe the 2nd produces _two_ tokens

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:07):

the former is one token and the later two I think. '' always forces a own token afaik

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:07):

I've indeed observed [| breaking things, but I've never found good docs IIRC

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:08):

While we are at it: if one wants to use two tokens but print them nicely, one can of course split the tokens and declare that printing should omit the space.
This here makes nil parse as [| |], but does not screw up other notations:
Notation "[ | | ]" := (nil _) (format "[ | | ]"): vector_scope.

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:09):

(notice the two spaces inside the middle of the format string)

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:09):

I've done that to Notation "[ | P | ]" := (only_provable P) (format "[ | P | ]").

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:12):

hm, I guess https://coq.inria.fr/refman/user-extensions/syntax-extensions.html#basic-notations arguably documents that, except it's not clear what's the difference between tokens and literals

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 14:12):

Thanks @Fabian Kunze. That does indeed work :D

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 14:16):

To me it seems kinda backwards. We want Coq to _first_ try parsing with the l ↦ dq v rule and therefore we move it _last_? :thinking:

view this post on Zulip Fabian Kunze (Nov 30 2020 at 14:16):

maybe it's a stack ;)

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:20):

That's exactly how shadowing works and should work, e.g. let x = foo in let x = bar in x (*bar*)

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:21):

I doubt notations are 100% lexically scoped, but they're also made available by Import foo.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:22):

However: I'd recommend Print Grammar constr. (and maybe Print Grammar dfrac.) to look at parse tables to confirm.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:23):

The parse tables grammars are somewhat readable — that is, they're BNF-like, not like the output of LR parser generators.

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 14:24):

It makes sense when you look at it like that. I just thought of it as a list of parsing rules we're I'd expect the first to be tried first.

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 14:25):

Well, please don't take my word for it — I just guessed based on this example.

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 14:26):

Noted :grinning:

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 14:59):

Ok, it turns out that the solution leads to another problem: Now Coq _prints_ with the l ↦ dq v and never with the l ↦ v notation. I.e., if you write l ↦ v then Coq prints it back as l ↦{#1} v which is not good.

view this post on Zulip Pierre-Marie Pédrot (Nov 30 2020 at 15:01):

This behaviour might have changed recently though.

view this post on Zulip Pierre-Marie Pédrot (Nov 30 2020 at 15:02):

https://github.com/coq/coq/pull/12986 probably changed the output

view this post on Zulip Simon Friis Vindum (Nov 30 2020 at 15:08):

That looks like a great change. It would be nice with a solution that worked with the current version of Coq though.

view this post on Zulip Fabian Kunze (Nov 30 2020 at 15:54):

In your real use case, do you use nesting inside the dp so that {} does sometimes not appear at toplevel? or could one combine the '{" with the ?

view this post on Zulip Hugo Herbelin (Nov 30 2020 at 20:10):

Yes, it is related to "{ _ }" being a declared notation without interpretation, but even if "{ _ }" had an interpretation, the notation "l ↦ v" would introduce an overlapping between an applied constr {dp} 10 and a dfrac {dp} followed by a constr 10. Would you use say "[ _ ]" and [ _ ] being used for something else in constr, would you have a problem too.

To avoid the overlapping, I believe that, instead of a notation "l ↦ v", what you would like is to declare:

Notation "" := (DfracOwn 1) (in custom dfrac).

This would be a way to factor "l ↦ dq v" and "l ↦ v", as indeed suggested by @Paolo Giarrusso .

view this post on Zulip Paolo Giarrusso (Nov 30 2020 at 20:13):

thanks, but I was thinking of the automatic factoring... Your notation would be (inferred as?) only-printing right? and it'll only be used for ↦ thanks to the custom entry right?

view this post on Zulip Simon Friis Vindum (Dec 01 2020 at 10:18):

@Fabian Kunze Yes, we could do that. The reason for using the custom entry is that this part of the notation can then be reused for other similar arrows as well.

@Hugo Herbelin Thank! That solution seems to work really well :+1:

view this post on Zulip Hugo Herbelin (Dec 01 2020 at 11:44):

Two more remarks, but first my apologies for not having seen the whole discussion when I replied yesterday (I only saw the two first messages).

so why is (⊢@{ different from ('⊢@{' ?

They are the same. Surrounding single quotes have an effect only on tokens starting with a letter, so as to distinguish them from variables, otherwise they are dropped. It is spaces which tell if a sequence of symbols is one or more tokens.

but I see! These are LL(1) recursive-descent parsers, not full CFG parsers, so of course rules are ordered — you only have left-biased choice, not general choice.

maybe it's a stack ;)

Yes, rules are ordered, in a last declared first used manner.

There is also an internal factorization made by Camlp5. If Print Grammar shows:

| "20" RIGHTA
  [ SELF; "↦"; NEXT
  | SELF; "↦"; constr:dfrac LEVEL "1"; NEXT ]

it is actually internally factorized as:

| "20" RIGHTA
  [ SELF; "↦"; [ NEXT | constr:dfrac LEVEL "1"; NEXT ] ]

For more details, see in Coq archive the factorizing function Grammar.try_insertthat is used as declaration time, as well as the unfactorizing function Grammar.flatten_tree used at printing time.

I've indeed observed [| breaking things, but I've never found good docs IIRC

The current parsing architecture is a standard two-step lexing then parsing approach. In particular, tokens, such as [|, are declared globally and new declarations of tokens can break parsing code expecting that this new token is instead cut into pieces. This is a serious problem with (at least) two possible solutions: 1) fuse lexing and parsing and declare tokens only locally to the rules that need them 2) only declare symbolic tokens made of one symbol and let the parser detect itself when two symbols should be without space inbetween them. I'm currently leaning towards option 1, which has the advantage to work also with "local" keywords. No precise plans yet though, it is more at the stage of discussion.

view this post on Zulip Paolo Giarrusso (Dec 01 2020 at 16:45):

@Hugo Herbelin scannerless parsing is a known research topic with several usable implementations, especially for extensible parsing, but works using that need extensions to usual parsing techniques — IIRC, you end up requiring some extensions to CFGs for specific features. I think the work on Spoofax (and the underlying technologies) by Eelco Visser and others is pretty relevant.

view this post on Zulip Jules Jacobs (Dec 01 2020 at 17:26):

@Paolo Giarrusso
I believe Spoofax has 3 extensions beyond CFGs: (1) you can specify that a specific set of characters cannot follow a production, for lexical disambiguation (2) precedence rules for operators (3) special support for whitespace/indentation sensitive parsing. IIRC Eelco Visser has previously expressed interest in extensible syntax and improving on Coq's and Agda's notation mechanisms.

view this post on Zulip Hugo Herbelin (Dec 01 2020 at 18:01):

Thanks for the pointers!

I must say that I know basically nothing about scannerless. Does it require recompiling the grammar after each extension? Would it support the LIST0, LIST1 and OPT combinators of campl5?

Is there some available material about Eelco Visser's interest for Coq's notations?

view this post on Zulip Jules Jacobs (Dec 01 2020 at 19:57):

I don't know of any public information, but I think I recall that he mentioned interest in dynamic grammar extensions like Agda and Coq in the reading group when we read "Parsing Mixfix Operators" (http://pl.ewi.tudelft.nl/readinggroup/). Spoofax does support lists (with * and +) and optional (with ?). Spoofax does require compiling the grammar (generating parse tables). Adding dynamic grammar extensions to Spoofax is probably not easy.

view this post on Zulip Paolo Giarrusso (Dec 01 2020 at 20:58):

Did he mention interest? I tried to convince him and Robbert to look into it, but I wasn't sure how successful that was over Slack.

view this post on Zulip Paolo Giarrusso (Dec 01 2020 at 21:02):

incremental grammar compilation has been an annoyance for quite a while, I remember Sebastian Erdweg suffered that back in 2010-2012 when he was using Spoofax to build SugarJ & relatives (which addressed how to use those technologies to allow in-language extensibility).

view this post on Zulip Paolo Giarrusso (Dec 01 2020 at 21:05):

On that front, I wonder if it’s a matter of engineering incremental table construction for Spoofax’s scannerless GLR parsing, or whether you’d need scannerless GLL — I suspect it’s more incremental (like LL parsers) but I’m sure it’s much more experimental, and less well understood.

view this post on Zulip Paolo Giarrusso (Dec 01 2020 at 21:06):

(actually, https://github.com/rust-lang/gll exists, so it must be more mature than I thought!)

view this post on Zulip Paolo Giarrusso (Dec 01 2020 at 21:11):

the state of the art probably lives in the papers by Scott and Johnstone (https://pure.royalholloway.ac.uk/portal/en/persons/elizabeth-scott(2b634c08-8249-413d-9654-031aed1f680b)/publications.html), but they’re extremely technical


Last updated: Feb 09 2023 at 00:03 UTC