Currently, the "no associativity" modifier for notations applies only to printing. For parsing, notations declared non associative are treated as left associative. Example:
Require Import Nat.
Infix "mod" := modulo (at level 40, no associativity).
Check 8 mod 5 mod 2.
(* (8 mod 5) mod 2 *)
How much:
The risk with 1. is that notations currently declared non associative often already exploit that they are actually parsed left associative, so this may lead to compatibility issues.
What would be lost to adopt 2.?
Really supporting "no associativity" would a priori be feasible in Camlp5, but first wondering if there are opinions around on this question.
vs fully renouncing to the no associativity modifier
what does that mean?
See in https://github.com/coq/coq/issues/17859 for the detailled explanation.
My opinion: I would try to implement non associativity in parsing (if easy enough) and assess it on CI: if it appears that most use there intended left associativity then we could just drop no associativity, otherwise keep the implementation, in both cases with a deprecation phase.
Well, I finally gave it a try yesterday and it seems to work. I can compile the stdlib with both non-associativity respected and the "recovery" mode of camlp5 deactivated. I don't know exactly how to proceed though.
The changes needed in the stdlib are:
binder_constr
to level 0 (!!!) and - term
from level 35 to level 10, so that it is compatible with removing the recovery mode (*)x =? y = true
, because =?
was declared non associative, so there is a design choice to do here about this family of notationsAnd, surprisingly, that's it!
The issue with (*) is now that the printer really needs to be able to support cycles (for binder_constr
). So, this would probably be the next step to do.
Maybe I could also just open a PR with the extended gramlib, even if the non-associativity and non-recovery features are left inactive until a decision is taken on the strategy?
add parentheses in many situations of the form x =? y = true, because =? was declared non associative, so there is a design choice to do here about this family of notations
Why does this need to change?
Currently non associativity is used only for printing (adding extra parentheses) but synonymous to left associativity for parsing. We may want to actually enforce non associativity for parsing (e.g. reject x =? y = true
and only accept (x =? y) = true
).
Hugo Herbelin said:
Well, I finally gave it a try yesterday and it seems to work. I can compile the stdlib with both non-associativity respected and the "recovery" mode of camlp5 deactivated.
Impressive!
Hugo Herbelin said:
The changes needed in the stdlib are:
- move
binder_constr
to level 0 (!!!) and- term
from level 35 to level 10, so that it is compatible with removing the recovery mode (*)
That might be the most impactful change, maybe worth opening a draft PR to run the CI and see how it fares.
Hugo Herbelin said:
- add parentheses in many situations of the form
x =? y = true
, because=?
was declared non associative, so there is a design choice to do here about this family of notations
Seems to be the whole point of enforcing non associativity, so makes perfect sense to me.
Hugo Herbelin said:
- make a few notations at level 0 or 40 follows the right-associativity of level 0 and 40 rather than being non-associative (there may be workarounds)
Shouldn't we first enable mixing associativity at the same level, as you suggesst in https://github.com/coq/coq/issues/17859 wouldn't that remove the need for this change?
That might be the most impactful change
Actually no, because what is in binder_constr already spans until the end of the term, and we keep it spanning by setting its right-hand side at level 200. In the other direction, the recovery of camlp5 makes that the binder_constr, even if at 200, are also parser at lower level (as in True /\ forall x, x = 0
). So, the only impact is on printing which is more complicated (i.e., currently, True /\ forall x, x = 0
is printed `True /\ (forall x, x = 0)
and one would have to drop the parentheses to be consistent with parsing).
Incidentally, I said moving it to 0, but I'm now leaning to moving it to 10 (or even maybe something intermediary between 10 and 70), because supporting e.g list forall x, x = 0
does not seem necessary (it is even urrently not accepted).
Shouldn't we first enable mixing associativity at the same level
Probably. The flexibility it would give seems worth, even if it seems a rather big change of model.
Hugo Herbelin said:
Incidentally, I said moving it to 0, but I'm now leaning to moving it to 10 (or even maybe something intermediary between 10 and 70), because supporting e.g
list forall x, x = 0
does not seem necessary (it is even urrently not accepted).
I'm not even sure this is desirable (I find it pretty unreadable).
Is Notation "x + y" := whatever (no associativity)
currently basically equivalent to
Notation "x + y" := whatever (left associativity, only parsing).
Notation "x + y" := (x + y) (no associativity, only printing).
with the later being forward compatible with respecting no associativity in parsing?
I don't understand what mixing associativity means. If *
(left assoc) and ^
(right assoc) were at the same level, how would you ever parse x * y ^ z
and x ^ y * z
? Actually, is there even an existing parser generator that allows it? (I am pretty sure both Bison and Menhir forbid it.)
C.f. https://github.com/coq/coq/issues/17859 basically, the left associativity gets precedence and your examples are parsed as (x * y) ^ z
and x ^ (y * z)
. It's similar to having two sublevels for left and right priority, left being just below right.
Accepting these expressions seems like a footgun. It would be better to choke on them, user-wise. I don't know if it is easily implemented, though. But if it is possible to choke on nonassociative operators, it should be possible to choke on operators changing associativity on the fly.
That's what is done currently: once some notation decides the associativity of a level, declaring another associativity in another notation command (or requiring a module with that command) is an error.
Even if the two notations come from diferrent libraries and operate on types that are so diferent that they will never be mixed.
Just to be clear, I know what is done currently. What I mean is that if we ever allow two associativity directions at the same level, we should make sure that mixing them fails at parsing time, rather than implicitly parenthesizing the expression one way rather than the other.
I agree failing at parsing time would be nice.
It's similar to having two sublevels for left and right priority, left being just below right.
Instead of having left below right, could they be incomparable? so if we start to parse in one we can't go into the other
That's what Guillaume suggest. I don't know how easy it would be to implement (probably not too hard but I haven't tried) but this would indeed be the best solution.
I would prefer to have "non-associative" to be also "non-associative" on parsing. It is important that people understand specs correctly when reading them and one can use non-associative notations to ensure readability if this is enforced. E.g. many C compilers now give warnings if parenthesis are not given in cases which are not obvious (to everyone) without them.
Pierre Roux said:
Currently non associativity is used only for printing (adding extra parentheses) but synonymous to left associativity for parsing. We may want to actually enforce non associativity for parsing (e.g. reject
x =? y = true
and only accept(x =? y) = true
).
This point still confuses me. I generally set notations to be "no associativity" when they cannot be nested. e.g., neither (x =? y) =? z
nor x =? (y =? z)
will typecheck. But (x =? y) = z
does typecheck, so I'd want the associativity of _ = _
to dominate here. Does this not happen for some reason?
(Moreover, I wouldn't be surprised if many of the "no associativity" specifiers followed this logic. It is not the case for _ mod _
, which I guess is not associative for clarity reasons, but for any boolean infix operators declared non associative this might explain the reasoning.)
In the stdlib, "_ = _"
and "_ =? _"
are both declared at level 70, no associativity
. Since for parsing this amount to left associativity, this means that x =? y = z
will be parsed as (x =? y) = z
and x = y =? z
as (x = y) =? z
. For =?
to have a higher priority than =
it should be at a level below 70 (I don't know why this is not the case).
@Guillaume Melquiond: The situation is more subtle than we might think. (At least, I realize now that I've been led into error by camlp5 for a long time!)
Here is detailed analysis whose summary, TL;DR, is:
XX_infix_postfix_right_assoc
,XX_infix_prefix_left_assoc
,XX_prefix_non_assoc_closed
.More details:
In the ideal world, we have a stratified grammar where for all op
in the level, a level has one of these forms
term10 ::= term10 op term9 | term10 postfix | ...
(left associative),term10 := term9 op term10 | prefix term10
(right associative),term10 ::= term9 op term9 | prefix term9 | term9 postfix
,term10 ::= rule_closed_on_both_side
,In practise, Camlp5 grammars are both more constrained:
term10 ::= term10 op term91: the alternative is to write
SELF op SELF with a left associative flag, or to write
SELF op term9`,term10 ::= term9 op term10
: the alternative is to write SELF op SELF
with a right associative flag, or to write SELF op term10
,but also more liberal:
and, additionally, do not respect the specification:
So, the situation is currently that
- Coq accepts that one development sets level XX right associative and another sets it left associative, resulting in conflicts,
- Coq accepts that the user gives rules with an arbitrary termXX on the right-hand side, which is liable to create circles (as in the case of rewstrategy
, see PR#17832).
There are yet additional strange things: SELF
on the rhs of a left associative level means the next level but SELF
on the lhs of a right associative level XX means the subset of rules of XX starting (morally) with a terminal!! So, a non-intuitive loss of symmetry as visible in the current example when recovery is removed:
Notation "x +++ y" := (x+y) (at level 23, left associativity).
Notation "!!" := 0 (at level 23).
Check !! +++ 0. (* allowed *)
but
Notation "x +++ y" := (x+y) (at level 23, right associativity).
Notation "!!" := 0 (at level 23).
Check 0 +++ !!. (* disallowed *)
For a couple of years now, I experimented here and there to change camlp5 so that:
True /\ forall x, x = 0
which camlp5 accepts even though forall
is at level 200 > 85 which is the level of \/
),In particular, I experimented to progressively move the "ideal model", realizing that moving to it breaks things:
True /\ forall x, x = 0
, which is common, because of the recovery mechanism,x =? y = z
, which is common (in stdlib), because of non-implemented non-associativity.Now, the advantages of rethinking each level as being 4 levels XX_infix_postfix_right_assoc
, XX_infix_prefix_left_assoc
, XX_infix_postfix_non_assoc
and XX_prefix_non_assoc_closed
are:
XX_infix_postfix_right_assoc
and a left associative infix notation goes to XX_infix_prefix_left_assoc
which are distinct,@Gaëtan Gilbert :
Instead of having left below right, could they be incomparable?
I would know how to do that with distinct grammars, but I don't know how to do it using levels: the left-hand side of rules using levels has to be SELF, otherwise it loops.
But why not, nothing forces us per se to use camlp5 levels. We can also emulate them as we like.
Regarding x =? y
, one issue is that there are two antagonistic views:
x =? y = z =? t
parses as (x =? y) = (z =? t)
The argument of clarity also applies: x =? y = z =? t
does not seem so obvious to parse correctly for a human, without parentheses.
@Gaëtan Gilbert:
Instead of having left below right, could they be incomparable?
I would know how to do that with distinct grammars
Actually, that does not look so easy to do. To turn incomparability of a left-assoc op and a right-assoc op at the same level into an error, as said by @Guillaume Melquiond, I can imagine the following but this is a bit technical:
term42:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41; "opL"; f = term42_no_right -> f x
| x = term41; "opN"; f = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41 -> x ]
term42_no_right:
[ y = term41; "opL"; f = term42_no_right -> fun x => f (OpL (x,y))
| y = term41 -> fun x -> OpL(x,y) ]
So, ordering a left-assoc op and a right-assoc op at the same level would be easier, as in:
term42:
[ x = SELF; "opR"; y = term42 -> OpR (x,y)
| x = SELF; "opL"; y = term41 -> OpL(x,y)
| x = term41 -> x ]
where, by construction of camlp5 LL parser, opR binds wider than opL.
And to support opN yet again below opL, the following should work:
term42: [
[ x = SELF; "opR"; y = term42 -> OpR (x,y)
| x = SELF; "opL"; y = term41 -> OpL(x,y)
| x = term42_no -> x ] ]
term42_no: [ NONA
[ x = term41; "opN"; y = term41 -> OpN (x,y)
| x = term41 -> x ] ]
I am bit confused. Why is it SELF
and not term41
on the left of opR
in term42
? Is it a peculiarity of camlp5? (Perhaps that is what you mean by 'by construction of camlp5 LL parser, opR binds wider than opL".)
In the first case it's the same but in the second case it is term42_no
instead of term41
, allowing "opN yet again below opL".
Also, with other rules, it could be different according to https://github.com/coq/coq/issues/17859 that states " SELF
on the left is not referring to the lower NEXT
level but instead to the subset of the current level not starting with SELF
!"
Hugo Herbelin said:
To turn incomparability of a left-assoc op and a right-assoc op at the same level into an error, as said by Guillaume Melquiond, I can imagine the following but this is a bit technical:
It indeed is a bit more technical but doesn't multiply the size of the camlp5 grammar by more than two and IINM camlp5 no longer needs to call continuations more than once, so we get noassoc "for free". It seems to behave as the majority of people here expect, have you tried implementing it?
Pierre Roux said:
"
SELF
on the left is not referring to the lowerNEXT
level but instead to the subset of the current level not starting withSELF
!"
Okay. So, that explains why term42 := SELF opR term42
makes opR
right-associative. But now I am even more confused because, by the same reasoning, term42 := SELF opL term41
should make opL
non-associative. I don't see how the recovery mode can make it left-associative (at worst it would make it right-associative).
This is explained in https://github.com/coq/coq/issues/17859, presented with term41_x opL term41_y opL term41_z
, it parses term41_x
then call the continuation ("opL"; y = term41
) andb builds opL(term41_x, term41_y)
then calls the continuation again finally parsing opL(opL(term41_x, term41_y), term41_z)
@Hugo Herbelin In fact, I don't get how your technical solution would work with multiple left associative operators at the same level
term42:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41; "opL"; f = term42_no_right -> f x
| x = term41; "opL'"; f = term42_no_right -> f x
| x = term41; "opN"; f = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41 -> x ]
term42_no_right:
[ y = term41; "opL"; f = term42_no_right -> fun x => f (OpL (x,y)) (* but could also be OpL' *)
| y = term41; "opL'"; f = term42_no_right -> fun x => f (OpL' (x,y)) (* but could also be OpL *)
| y = term41 -> fun x -> OpL(x,y) ]
maybe we need some kind of continuation passing style
term42:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41; "opL"; k = term42_no_right -> k (fun y => OpL (x,y))
| x = term41; "opL'"; k = term42_no_right -> k (fun y => OpL' (x,y))
| x = term41; "opN"; f = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41 -> x ]
term42_no_right:
[ x = term41; "opL"; k = term42_no_right -> fun f => let x = f x in k (fun y => OpL (x, y))
| x = term41; "opL'"; k = term42_no_right -> fun f => let x = f x in k (fun y => OpL' (x, y))
| x = term41 -> fun f => f x ]
Pierre Roux said:
This is explained in https://github.com/coq/coq/issues/17859, presented with
term41_x opL term41_y opL term41_z
, it parsesterm41_x
then call the continuation ("opL"; y = term41
) andb buildsopL(term41_x, term41_y)
then calls the continuation again finally parsingopL(opL(term41_x, term41_y), term41_z)
I agree with this part, which is the traditional way of breaking left recursion in a LL(1) parser. My confusion stems more from the fact that camlp5 seems to give a different meaning to SELF
in the rules SELF opR term42
and SELF opL term41
. But perhaps, that is just a peculiarity of camlp5, which is giving a different meaning to SELF
depending on the level of the rightmost symbol.
Which different meaning? My undestanding is that both SELF
mean "subset of the current level not starting with SELF
" (so in both cases the single rule x = term41
).
Pierre Roux said:
maybe we need some kind of continuation passing style
If you are breaking the left-recursion by hand, shouldn't it be instead just the following?
term42:
[ x = term41; y = term42_no_left -> y x
| x = term41; y = term42_no_right -> y x
| x = term41; "opN"; y = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ "opR"; x' = term41; y = term42_no_left -> fun x => OpR (x, y x') ]
term42_no_right:
[ "opL"; x' = term41; y = term42_no_right -> fun x => y (opL (x, x'))
| "opL'"; x' = term41; y = term42_no_right -> fun x => y (opL' (x, x')) ]
Each operator appears only once, so no explosion of the grammar, and it makes it obvious that it is indeed LL(1).
(If we can no longer trust camlp5 to break the left-recursion on its own, we might just as well go all the way to pure LL(1).)
(To be clear, pure LL(1) would factor all the rules of term42
into a single rule x = term41; y = term42_suffix -> y x
. But camlp5 already performs left-factorization for free.)
IINM camlp5 no longer needs to call continuations more than once, so we get noassoc "for free".
It is my understaning too.
It seems to behave as the majority of people here expect, have you tried implementing it?
I did not.
Another con to this solution is that Print Grammar constr
would be more complex to decode for the user (or Coq would have to provide its own parser).
Another pro, since it relies on less assumptions of the underlying engine, is that it would be (suspectingly) easier to experiment different underlying parsing engines (e.g. a GLR one).
Ensuring incomparability of some notations is actually a feature. I mean: in the current model, the order on symbols is total, that is, when parsing any two overlapping rules for a symbol, one rule has necessarily the precedence over the other.
Accepting two notations differing in the associativity to be at the same formal level XX in this model just mean that each level is actually two levels, say XX.5 for right associative and XX.0 for left associative, with XX.5 just above XX.0.
In contrast, incomparability requires a new model where levels are not totally ordered. In the current situation where all notations already have an absolute level rather than a relative level, would we retroactively be able to take advantage of such new model? I mean, is it worth specifically making XX.5 and XX.0 not comparable when all the rest of the hierarchy is already forced to be totally ordered?
Last updated: Nov 29 2023 at 22:01 UTC