Hi all :smile:. I have a problem related to parsing /notation. A minimal example reproducing the issue is below. I need to implement four forms of notation: l ↦ v
, l ↦□ v
, l ↦{q} v
and l ↦{# q} v
where the latter three notations are implemented using a custom entry
. The last three works in isolation, but they break when I add the first notation as well (see the comments in the example).
I think the problem may be caused by some special handling of notation of the form { x }
in Coq? If I change the notation "{ dq }"
into e.g., "{^ dq }"}
or "[{ dq }]"}
then it works. The documentation (see the third note here) seems to confirm that { x }
notation is special, but I'm not sure if that's what's causing the problem and/or what a solution would be?
Require Import Numbers.BinNums.
Inductive dfrac := DfracOwn : nat -> dfrac | DfracDiscarded : dfrac.
Declare Custom Entry dfrac.
Notation "{ dq }" := (dq) (in custom dfrac at level 1, dq constr).
Notation "□" := DfracDiscarded (in custom dfrac).
Notation "{# q }" := (DfracOwn q) (in custom dfrac at level 1, q constr).
Definition mapsto (l : nat) (dq : dfrac) (v : nat) : Prop := False.
Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)
Check (10 ↦ 10). (* Works. *)
Check (10 ↦□ 10). (* Works. *)
Check (10 ↦{# 1 + 2} 10). (* Works *)
Check (exists dq, 10 ↦{dq} 10). (* Error: Unknown interpretation for notation "{ _ }". *)
Tagging @Hugo Herbelin as I was told you might be able to figure out what's going on :smile:
Might not be the issue, but for this to have a chance of working, I think you need Coq to left-factor the two mapsto notations, which needs Coq to guess the same levels for l (and v?). You probably want to set those levels explicitly.
(mostly because the heuristics producing those levels are not documented, and sometimes they're surprising)
(Iris has some internal docs on this, and pointers to official docs, at https://gitlab.mpi-sws.org/iris/iris/-/blob/master/docs/proof_guide.md#notations)
Thank for answering @Paolo Giarrusso. What you're talking about is not theat level 20
which is already there? Also, just to mention it, I have tried a lot of fiddling with the levels to no success.
next to at level 20
there are inferred l at level M, v at level N, ...
your results _suggest_ Coq's guessing the same M and N for both notations, maybe?
It works for me if you just swap the order of the two notations
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)
Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
O_O
@Fabian Kunze does that make sense to you? can you explain it?
(Nothing else changed from your pasted code)
I would expect that "v" is now not allowed to "start" with '{' in a concrete parse.
Maybe the parser has to decide if it uses the first or second rule based on the symbols it sees? And because '{' alredy exists in the constr-language, it thinks "well, this should be a constr"?
the whole handling of '{' seems very hacky, so I tried the first hacky idea I cam up with ;)
Ah, no, it has nothign to do with the specialal '{':
Require Import Numbers.BinNums.
Inductive dfrac := DfracOwn : nat -> dfrac | DfracDiscarded : dfrac.
Require List. Import List.ListNotations.
Declare Custom Entry dfrac.
Notation "[ dq ]" := (dq) (in custom dfrac at level 1, dq constr).
Notation "□" := DfracDiscarded (in custom dfrac).
Notation "{# q }" := (DfracOwn q) (in custom dfrac at level 1, q constr).
Definition mapsto (l : nat) (dq : dfrac) (v : nat) : Prop := False.
Notation "l ↦ dq v" := (mapsto l dq v) (at level 20, dq custom dfrac at level 1).
Notation "l ↦ v" := (mapsto l (DfracOwn 1) v) (at level 20) : type_scope. (* If this is removed then the last example works. *)
Check (10 ↦ 10). (* Works. *)
Check (10 ↦□ 10). (* Works. *)
Check (10 ↦{# 1 + 2} 10). (* Works *)
Check (exists dq, 10 ↦[dq] 10). (* Error: Unknown interpretation for notation "[ _ ]". *)
using "[" leads to the same behaviour
now the error uses [
right?
yeah
working in one order, not working in the other
but I see! These are LL(1) recursive-descent parsers, not full CFG parsers, so of course rules are ordered — you only have left-biased choice, not general choice.
and if you want later rules to have priorities (which is... not unreasonable?), they should be added to the _left_ of existing ones (which would explain this behavior). But I'm not sure I've ever seen this happen.
@Fabian Kunze also {#
still uses {
.
two letter tokens and one-letter-tokens are distinguished before parsing
and if you think "that's a different rule and it can't matter, parsers are compositional" I'm going to go "oh sweet summer child"
but is {#
one token?
yes
if it were '{#'
, that'd be clearer, but I never got the semantics of {#
, and IIRC it's observably not the same.
thats the reason why sometimes, new notation breaks old stuff in horrible ways. e.g. adding a notation with [|
breaks all ltac-destruct patterns destruct n as [|].
the default is that consequent special characters are one token.
so why is (⊢@{
different from ('⊢@{'
?
ah, maybe the 2nd produces _two_ tokens
the former is one token and the later two I think. ''
always forces a own token afaik
I've indeed observed [|
breaking things, but I've never found good docs IIRC
While we are at it: if one wants to use two tokens but print them nicely, one can of course split the tokens and declare that printing should omit the space.
This here makes nil parse as [| |]
, but does not screw up other notations:
Notation "[ | | ]" := (nil _) (format "[ | | ]"): vector_scope.
(notice the two spaces inside the middle of the format
string)
I've done that to Notation "[ | P | ]" := (only_provable P) (format "[ | P | ]").
hm, I guess https://coq.inria.fr/refman/user-extensions/syntax-extensions.html#basic-notations arguably documents that, except it's not clear what's the difference between tokens and literals
Thanks @Fabian Kunze. That does indeed work :D
To me it seems kinda backwards. We want Coq to _first_ try parsing with the l ↦ dq v
rule and therefore we move it _last_? :thinking:
maybe it's a stack ;)
That's exactly how shadowing works and should work, e.g. let x = foo in let x = bar in x (*bar*)
I doubt notations are 100% lexically scoped, but they're also made available by Import foo
.
However: I'd recommend Print Grammar constr.
(and maybe Print Grammar dfrac.
) to look at parse tables to confirm.
The parse tables grammars are somewhat readable — that is, they're BNF-like, not like the output of LR parser generators.
It makes sense when you look at it like that. I just thought of it as a list of parsing rules we're I'd expect the first to be tried first.
Well, please don't take my word for it — I just guessed based on this example.
Noted :grinning:
Ok, it turns out that the solution leads to another problem: Now Coq _prints_ with the l ↦ dq v
and never with the l ↦ v
notation. I.e., if you write l ↦ v
then Coq prints it back as l ↦{#1} v
which is not good.
This behaviour might have changed recently though.
https://github.com/coq/coq/pull/12986 probably changed the output
That looks like a great change. It would be nice with a solution that worked with the current version of Coq though.
In your real use case, do you use nesting inside the dp so that {}
does sometimes not appear at toplevel? or could one combine the '{" with the ↦
?
Yes, it is related to "{ _ }"
being a declared notation without interpretation, but even if "{ _ }"
had an interpretation, the notation "l ↦ v"
would introduce an overlapping between an applied constr {dp} 10
and a dfrac {dp}
followed by a constr 10
. Would you use say "[ _ ]"
and [ _ ]
being used for something else in constr
, would you have a problem too.
To avoid the overlapping, I believe that, instead of a notation "l ↦ v"
, what you would like is to declare:
Notation "" := (DfracOwn 1) (in custom dfrac).
This would be a way to factor "l ↦ dq v"
and "l ↦ v"
, as indeed suggested by @Paolo Giarrusso .
thanks, but I was thinking of the automatic factoring... Your notation would be (inferred as?) only-printing right? and it'll only be used for ↦ thanks to the custom entry right?
@Fabian Kunze Yes, we could do that. The reason for using the custom entry is that this part of the notation can then be reused for other similar arrows as well.
@Hugo Herbelin Thank! That solution seems to work really well :+1:
Two more remarks, but first my apologies for not having seen the whole discussion when I replied yesterday (I only saw the two first messages).
so why is
(⊢@{
different from('⊢@{'
?
They are the same. Surrounding single quotes have an effect only on tokens starting with a letter, so as to distinguish them from variables, otherwise they are dropped. It is spaces which tell if a sequence of symbols is one or more tokens.
but I see! These are LL(1) recursive-descent parsers, not full CFG parsers, so of course rules are ordered — you only have left-biased choice, not general choice.
maybe it's a stack ;)
Yes, rules are ordered, in a last declared first used manner.
There is also an internal factorization made by Camlp5. If Print Grammar
shows:
| "20" RIGHTA
[ SELF; "↦"; NEXT
| SELF; "↦"; constr:dfrac LEVEL "1"; NEXT ]
it is actually internally factorized as:
| "20" RIGHTA
[ SELF; "↦"; [ NEXT | constr:dfrac LEVEL "1"; NEXT ] ]
For more details, see in Coq archive the factorizing function Grammar.try_insert
that is used as declaration time, as well as the unfactorizing function Grammar.flatten_tree
used at printing time.
I've indeed observed
[|
breaking things, but I've never found good docs IIRC
The current parsing architecture is a standard two-step lexing then parsing approach. In particular, tokens, such as [|
, are declared globally and new declarations of tokens can break parsing code expecting that this new token is instead cut into pieces. This is a serious problem with (at least) two possible solutions: 1) fuse lexing and parsing and declare tokens only locally to the rules that need them 2) only declare symbolic tokens made of one symbol and let the parser detect itself when two symbols should be without space inbetween them. I'm currently leaning towards option 1, which has the advantage to work also with "local" keywords. No precise plans yet though, it is more at the stage of discussion.
@Hugo Herbelin scannerless parsing is a known research topic with several usable implementations, especially for extensible parsing, but works using that need extensions to usual parsing techniques — IIRC, you end up requiring some extensions to CFGs for specific features. I think the work on Spoofax (and the underlying technologies) by Eelco Visser and others is pretty relevant.
@Paolo Giarrusso
I believe Spoofax has 3 extensions beyond CFGs: (1) you can specify that a specific set of characters cannot follow a production, for lexical disambiguation (2) precedence rules for operators (3) special support for whitespace/indentation sensitive parsing. IIRC Eelco Visser has previously expressed interest in extensible syntax and improving on Coq's and Agda's notation mechanisms.
Thanks for the pointers!
I must say that I know basically nothing about scannerless. Does it require recompiling the grammar after each extension? Would it support the LIST0
, LIST1
and OPT
combinators of campl5?
Is there some available material about Eelco Visser's interest for Coq's notations?
I don't know of any public information, but I think I recall that he mentioned interest in dynamic grammar extensions like Agda and Coq in the reading group when we read "Parsing Mixfix Operators" (http://pl.ewi.tudelft.nl/readinggroup/). Spoofax does support lists (with * and +) and optional (with ?). Spoofax does require compiling the grammar (generating parse tables). Adding dynamic grammar extensions to Spoofax is probably not easy.
Did he mention interest? I tried to convince him and Robbert to look into it, but I wasn't sure how successful that was over Slack.
incremental grammar compilation has been an annoyance for quite a while, I remember Sebastian Erdweg suffered that back in 2010-2012 when he was using Spoofax to build SugarJ & relatives (which addressed how to use those technologies to allow in-language extensibility).
On that front, I wonder if it’s a matter of engineering incremental table construction for Spoofax’s scannerless GLR parsing, or whether you’d need scannerless GLL — I suspect it’s more incremental (like LL parsers) but I’m sure it’s much more experimental, and less well understood.
(actually, https://github.com/rust-lang/gll exists, so it must be more mature than I thought!)
the state of the art probably lives in the papers by Scott and Johnstone (https://pure.royalholloway.ac.uk/portal/en/persons/elizabeth-scott(2b634c08-8249-413d-9654-031aed1f680b)/publications.html), but they’re extremely technical
Last updated: Sep 28 2023 at 11:01 UTC