Currently, the "no associativity" modifier for notations applies only to printing. For parsing, notations declared non associative are treated as left associative. Example:

```
Require Import Nat.
Infix "mod" := modulo (at level 40, no associativity).
Check 8 mod 5 mod 2.
(* (8 mod 5) mod 2 *)
```

How much:

- would it be worth or useful to add support for non associativity in parsing
- vs fully renouncing to the no associativity modifier

The risk with 1. is that notations currently declared non associative often already exploit that they are actually parsed left associative, so this may lead to compatibility issues.

What would be lost to adopt 2.?

Really supporting "no associativity" would a priori be feasible in Camlp5, but first wondering if there are opinions around on this question.

vs fully renouncing to the no associativity modifier

what does that mean?

See in https://github.com/coq/coq/issues/17859 for the detailled explanation.

My opinion: I would try to implement non associativity in parsing (if easy enough) and assess it on CI: if it appears that most use there intended left associativity then we could just drop no associativity, otherwise keep the implementation, in both cases with a deprecation phase.

Well, I finally gave it a try yesterday and it seems to work. I can compile the stdlib with both non-associativity respected and the "recovery" mode of camlp5 deactivated. I don't know exactly how to proceed though.

The changes needed in the stdlib are:

- move
`binder_constr`

to level 0 (!!!) and`- term`

from level 35 to level 10, so that it is compatible with removing the recovery mode (*) - add parentheses in many situations of the form
`x =? y = true`

, because`=?`

was declared non associative, so there is a design choice to do here about this family of notations - add explicitly RIGHTA in a few GRAMMAR EXTEND and ARGUMENT EXTEND
- make a few notations at level 0 or 40 follows the right-associativity of level 0 and 40 rather than being non-associative (there may be workarounds)

And, surprisingly, that's it!

The issue with (*) is now that the printer really needs to be able to support cycles (for `binder_constr`

). So, this would probably be the next step to do.

Maybe I could also just open a PR with the extended gramlib, even if the non-associativity and non-recovery features are left inactive until a decision is taken on the strategy?

add parentheses in many situations of the form x =? y = true, because =? was declared non associative, so there is a design choice to do here about this family of notations

Why does this need to change?

Currently non associativity is used only for printing (adding extra parentheses) but synonymous to left associativity for parsing. We may want to actually enforce non associativity for parsing (e.g. reject `x =? y = true`

and only accept `(x =? y) = true`

).

Hugo Herbelin said:

Well, I finally gave it a try yesterday and it seems to work. I can compile the stdlib with both non-associativity respected and the "recovery" mode of camlp5 deactivated.

Impressive!

Hugo Herbelin said:

The changes needed in the stdlib are:

- move
`binder_constr`

to level 0 (!!!) and`- term`

from level 35 to level 10, so that it is compatible with removing the recovery mode (*)

That might be the most impactful change, maybe worth opening a draft PR to run the CI and see how it fares.

Hugo Herbelin said:

- add parentheses in many situations of the form
`x =? y = true`

, because`=?`

was declared non associative, so there is a design choice to do here about this family of notations

Seems to be the whole point of enforcing non associativity, so makes perfect sense to me.

Hugo Herbelin said:

- make a few notations at level 0 or 40 follows the right-associativity of level 0 and 40 rather than being non-associative (there may be workarounds)

Shouldn't we first enable mixing associativity at the same level, as you suggesst in https://github.com/coq/coq/issues/17859 wouldn't that remove the need for this change?

That might be the most impactful change

Actually no, because what is in binder_constr already spans until the end of the term, and we keep it spanning by setting its right-hand side at level 200. In the other direction, the recovery of camlp5 makes that the binder_constr, even if at 200, are also parser at lower level (as in `True /\ forall x, x = 0`

). So, the only impact is on printing which is more complicated (i.e., currently, `True /\ forall x, x = 0`

is printed ``True /\ (forall x, x = 0)`

and one would have to drop the parentheses to be consistent with parsing).

Incidentally, I said moving it to 0, but I'm now leaning to moving it to 10 (or even maybe something intermediary between 10 and 70), because supporting e.g `list forall x, x = 0`

does not seem necessary (it is even urrently not accepted).

Shouldn't we first enable mixing associativity at the same level

Probably. The flexibility it would give seems worth, even if it seems a rather big change of model.

Hugo Herbelin said:

Incidentally, I said moving it to 0, but I'm now leaning to moving it to 10 (or even maybe something intermediary between 10 and 70), because supporting e.g

`list forall x, x = 0`

does not seem necessary (it is even urrently not accepted).

I'm not even sure this is desirable (I find it pretty unreadable).

Is `Notation "x + y" := whatever (no associativity)`

currently basically equivalent to

```
Notation "x + y" := whatever (left associativity, only parsing).
Notation "x + y" := (x + y) (no associativity, only printing).
```

with the later being forward compatible with respecting no associativity in parsing?

I don't understand what mixing associativity means. If `*`

(left assoc) and `^`

(right assoc) were at the same level, how would you ever parse `x * y ^ z`

and `x ^ y * z`

? Actually, is there even an existing parser generator that allows it? (I am pretty sure both Bison and Menhir forbid it.)

C.f. https://github.com/coq/coq/issues/17859 basically, the left associativity gets precedence and your examples are parsed as `(x * y) ^ z`

and `x ^ (y * z)`

. It's similar to having two sublevels for left and right priority, left being just below right.

Accepting these expressions seems like a footgun. It would be better to choke on them, user-wise. I don't know if it is easily implemented, though. But if it is possible to choke on nonassociative operators, it should be possible to choke on operators changing associativity on the fly.

That's what is done currently: once some notation decides the associativity of a level, declaring another associativity in another notation command (or requiring a module with that command) is an error.

Even if the two notations come from diferrent libraries and operate on types that are so diferent that they will never be mixed.

Just to be clear, I know what is done currently. What I mean is that if we ever allow two associativity directions at the same level, we should make sure that mixing them fails at parsing time, rather than implicitly parenthesizing the expression one way rather than the other.

I agree failing at parsing time would be nice.

It's similar to having two sublevels for left and right priority, left being just below right.

Instead of having left below right, could they be incomparable? so if we start to parse in one we can't go into the other

That's what Guillaume suggest. I don't know how easy it would be to implement (probably not too hard but I haven't tried) but this would indeed be the best solution.

I would prefer to have "non-associative" to be also "non-associative" on parsing. It is important that people understand specs correctly when reading them and one can use non-associative notations to ensure readability if this is enforced. E.g. many C compilers now give warnings if parenthesis are not given in cases which are not obvious (to everyone) without them.

Pierre Roux said:

Currently non associativity is used only for printing (adding extra parentheses) but synonymous to left associativity for parsing. We may want to actually enforce non associativity for parsing (e.g. reject

`x =? y = true`

and only accept`(x =? y) = true`

).

This point still confuses me. I generally set notations to be "no associativity" when they cannot be nested. e.g., neither `(x =? y) =? z`

nor `x =? (y =? z)`

will typecheck. But `(x =? y) = z`

does typecheck, so I'd want the associativity of `_ = _`

to dominate here. Does this not happen for some reason?

(Moreover, I wouldn't be surprised if many of the "no associativity" specifiers followed this logic. It is not the case for `_ mod _`

, which I guess is not associative for clarity reasons, but for any boolean infix operators declared non associative this might explain the reasoning.)

In the stdlib, `"_ = _"`

and `"_ =? _"`

are both declared `at level 70, no associativity`

. Since for parsing this amount to left associativity, this means that `x =? y = z`

will be parsed as `(x =? y) = z`

and `x = y =? z`

as `(x = y) =? z`

. For `=?`

to have a higher priority than `=`

it should be at a level below 70 (I don't know why this is not the case).

@Guillaume Melquiond: The situation is more subtle than we might think. (At least, I realize now that I've been led into error by camlp5 for a long time!)

Here is detailed analysis whose summary, TL;DR, is:

- there is an ideal world with stratified rules, as you are mentioning,
- there is the current situation which does not follow in practice the ideal world,
- I tried for some years to restrict the current situation to its ideal subset and this requires to adapt developments,
- but there is another alternative, which is to rephrase the current situation as an ideal subset of a larger world: instead of restricting the levels, the trick is to split each level into say 4 distinct levels:
`XX_infix_postfix_right_assoc`

,`XX_infix_prefix_left_assoc`

,- `XX_infix_postfix_non_assoc,
`XX_prefix_non_assoc_closed`

.

More details:

In the ideal world, we have a stratified grammar where for all `op`

in the level, a level has one of these forms

`term10 ::= term10 op term9 | term10 postfix | ...`

(left associative),`term10 := term9 op term10 | prefix term10`

(right associative),`term10 ::= term9 op term9 | prefix term9 | term9 postfix`

,`term10 ::= rule_closed_on_both_side`

,

or, maybe, we would even expect prefix/postfix/infix operators not be mixed in the same level??

In practise, Camlp5 grammars are both more constrained:

- it can only be SELF or a terminal on the left-hand-side, so:
- no way to say
`term10 ::= term10 op term91: the alternative is to write`

SELF op SELF`with a left associative flag, or to write`

SELF op term9`, - no way to say
`term10 ::= term9 op term10`

: the alternative is to write`SELF op SELF`

with a right associative flag, or to write`SELF op term10`

,

- no way to say

but also more liberal:

- they accept any level termXX on the right-hand side of a rule,

and, additionally, *do not respect the specification*:

- there is a recovery mechanism to bypass the level constraint on the right-hand sides if nothing matched,
- no associativity defaults to left associativity.

So, the situation is currently that

- Camlp5 does not ensure by default that the user will follow this model, nor even ensures the specification if ever the model were followed,
- Coq does not (fully) check either that the user follow this model:
- Coq accepts that one development sets level XX right associative and another sets it left associative, resulting in conflicts,

- Coq accepts that the user gives rules with an arbitrary termXX on the right-hand side, which is liable to create circles (as in the case of`rewstrategy`

, see PR#17832).

There are yet additional strange things: `SELF`

on the rhs of a left associative level means the next level but `SELF`

on the lhs of a right associative level XX means the subset of rules of XX starting (morally) with a terminal!! So, a non-intuitive loss of symmetry as visible in the current example when recovery is removed:

```
Notation "x +++ y" := (x+y) (at level 23, left associativity).
Notation "!!" := 0 (at level 23).
Check !! +++ 0. (* allowed *)
```

but

```
Notation "x +++ y" := (x+y) (at level 23, right associativity).
Notation "!!" := 0 (at level 23).
Check 0 +++ !!. (* disallowed *)
```

For a couple of years now, I experimented here and there to change camlp5 so that:

- it satisfies the specification (no recovery mechanism, as in
`True /\ forall x, x = 0`

which camlp5 accepts even though`forall`

is at level 200 > 85 which is the level of`\/`

), - it satisfies no-associativity (which unfortunately I only realized late that it is not implemented nor simulatable).

In particular, I experimented to progressively move the "ideal model", realizing that moving to it breaks things:

- it breaks
`True /\ forall x, x = 0`

, which is common, because of the recovery mechanism, - it breaks
`x =? y = z`

, which is common (in stdlib), because of non-implemented non-associativity.

Now, the advantages of rethinking each level as being 4 levels `XX_infix_postfix_right_assoc`

, `XX_infix_prefix_left_assoc`

, `XX_infix_postfix_non_assoc`

and `XX_prefix_non_assoc_closed`

are:

- no more conflict when loading two developments: a right associative infix notation at level XX goes to
`XX_infix_postfix_right_assoc`

and a left associative infix notation goes to`XX_infix_prefix_left_assoc`

which are distinct, - the strange asymmetry between next meaning the next level on the right but meaning the subrules starting with a terminal on the left would be fixed.

@Gaëtan Gilbert :

Instead of having left below right, could they be incomparable?

I would know how to do that with distinct grammars, but I don't know how to do it using levels: the left-hand side of rules using levels has to be SELF, otherwise it loops.

But why not, nothing forces us per se to use camlp5 levels. We can also emulate them as we like.

Regarding `x =? y`

, one issue is that there are two antagonistic views:

- as an operator: they should then be put at level lower than 70, so that
`x =? y = z =? t`

parses as`(x =? y) = (z =? t)`

- as a proposition, with an implicit coercion from bool to Prop: this suggests to have them non-associative at level 70.

The argument of clarity also applies: `x =? y = z =? t`

does not seem so obvious to parse correctly for a human, without parentheses.

@Gaëtan Gilbert:

Instead of having left below right, could they be incomparable?

I would know how to do that with distinct grammars

Actually, that does not look so easy to do. To turn incomparability of a left-assoc op and a right-assoc op at the same level into an error, as said by @Guillaume Melquiond, I can imagine the following but this is a bit technical:

```
term42:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41; "opL"; f = term42_no_right -> f x
| x = term41; "opN"; f = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41 -> x ]
term42_no_right:
[ y = term41; "opL"; f = term42_no_right -> fun x => f (OpL (x,y))
| y = term41 -> fun x -> OpL(x,y) ]
```

So, ordering a left-assoc op and a right-assoc op at the same level would be easier, as in:

```
term42:
[ x = SELF; "opR"; y = term42 -> OpR (x,y)
| x = SELF; "opL"; y = term41 -> OpL(x,y)
| x = term41 -> x ]
```

where, by construction of camlp5 LL parser, opR binds wider than opL.

And to support opN yet again below opL, the following should work:

```
term42: [
[ x = SELF; "opR"; y = term42 -> OpR (x,y)
| x = SELF; "opL"; y = term41 -> OpL(x,y)
| x = term42_no -> x ] ]
term42_no: [ NONA
[ x = term41; "opN"; y = term41 -> OpN (x,y)
| x = term41 -> x ] ]
```

I am bit confused. Why is it `SELF`

and not `term41`

on the left of `opR`

in `term42`

? Is it a peculiarity of camlp5? (Perhaps that is what you mean by 'by construction of camlp5 LL parser, opR binds wider than opL".)

In the first case it's the same but in the second case it is `term42_no`

instead of `term41`

, allowing "opN yet again below opL".

Also, with other rules, it could be different according to https://github.com/coq/coq/issues/17859 that states " `SELF`

on the left is not referring to the lower `NEXT`

level but instead to the subset of the current level not starting with `SELF`

!"

Hugo Herbelin said:

To turn incomparability of a left-assoc op and a right-assoc op at the same level into an error, as said by Guillaume Melquiond, I can imagine the following but this is a bit technical:

It indeed is a bit more technical but doesn't multiply the size of the camlp5 grammar by more than two and IINM camlp5 no longer needs to call continuations more than once, so we get noassoc "for free". It seems to behave as the majority of people here expect, have you tried implementing it?

Pierre Roux said:

"

`SELF`

on the left is not referring to the lower`NEXT`

level but instead to the subset of the current level not starting with`SELF`

!"

Okay. So, that explains why `term42 := SELF opR term42`

makes `opR`

right-associative. But now I am even more confused because, by the same reasoning, `term42 := SELF opL term41`

should make `opL`

non-associative. I don't see how the recovery mode can make it left-associative (at worst it would make it right-associative).

This is explained in https://github.com/coq/coq/issues/17859, presented with `term41_x opL term41_y opL term41_z`

, it parses `term41_x`

then call the continuation (`"opL"; y = term41`

) andb builds `opL(term41_x, term41_y)`

then calls the continuation again finally parsing `opL(opL(term41_x, term41_y), term41_z)`

@Hugo Herbelin In fact, I don't get how your technical solution would work with multiple left associative operators at the same level

```
term42:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41; "opL"; f = term42_no_right -> f x
| x = term41; "opL'"; f = term42_no_right -> f x
| x = term41; "opN"; f = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41 -> x ]
term42_no_right:
[ y = term41; "opL"; f = term42_no_right -> fun x => f (OpL (x,y)) (* but could also be OpL' *)
| y = term41; "opL'"; f = term42_no_right -> fun x => f (OpL' (x,y)) (* but could also be OpL *)
| y = term41 -> fun x -> OpL(x,y) ]
```

maybe we need some kind of continuation passing style

```
term42:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41; "opL"; k = term42_no_right -> k (fun y => OpL (x,y))
| x = term41; "opL'"; k = term42_no_right -> k (fun y => OpL' (x,y))
| x = term41; "opN"; f = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
| x = term41 -> x ]
term42_no_right:
[ x = term41; "opL"; k = term42_no_right -> fun f => let x = f x in k (fun y => OpL (x, y))
| x = term41; "opL'"; k = term42_no_right -> fun f => let x = f x in k (fun y => OpL' (x, y))
| x = term41 -> fun f => f x ]
```

Pierre Roux said:

This is explained in https://github.com/coq/coq/issues/17859, presented with

`term41_x opL term41_y opL term41_z`

, it parses`term41_x`

then call the continuation (`"opL"; y = term41`

) andb builds`opL(term41_x, term41_y)`

then calls the continuation again finally parsing`opL(opL(term41_x, term41_y), term41_z)`

I agree with this part, which is the traditional way of breaking left recursion in a LL(1) parser. My confusion stems more from the fact that camlp5 seems to give a different meaning to `SELF`

in the rules `SELF opR term42`

and `SELF opL term41`

. But perhaps, that is just a peculiarity of camlp5, which is giving a different meaning to `SELF`

depending on the level of the rightmost symbol.

Which different meaning? My undestanding is that both `SELF`

mean "subset of the current level not starting with `SELF`

" (so in both cases the single rule `x = term41`

).

Pierre Roux said:

maybe we need some kind of continuation passing style

If you are breaking the left-recursion by hand, shouldn't it be instead just the following?

```
term42:
[ x = term41; y = term42_no_left -> y x
| x = term41; y = term42_no_right -> y x
| x = term41; "opN"; y = term41 -> OpN(x,y)
| x = term41 -> x ]
term42_no_left:
[ "opR"; x' = term41; y = term42_no_left -> fun x => OpR (x, y x') ]
term42_no_right:
[ "opL"; x' = term41; y = term42_no_right -> fun x => y (opL (x, x'))
| "opL'"; x' = term41; y = term42_no_right -> fun x => y (opL' (x, x')) ]
```

Each operator appears only once, so no explosion of the grammar, and it makes it obvious that it is indeed LL(1).

(If we can no longer trust camlp5 to break the left-recursion on its own, we might just as well go all the way to pure LL(1).)

(To be clear, pure LL(1) would factor all the rules of `term42`

into a single rule `x = term41; y = term42_suffix -> y x`

. But camlp5 already performs left-factorization for free.)

IINM camlp5 no longer needs to call continuations more than once, so we get noassoc "for free".

It is my understaning too.

It seems to behave as the majority of people here expect, have you tried implementing it?

I did not.

Another con to this solution is that `Print Grammar constr`

would be more complex to decode for the user (or Coq would have to provide its own parser).

Another pro, since it relies on less assumptions of the underlying engine, is that it would be (suspectingly) easier to experiment different underlying parsing engines (e.g. a GLR one).

Ensuring incomparability of some notations is actually a feature. I mean: in the current model, the order on symbols is total, that is, when parsing any two overlapping rules for a symbol, one rule has necessarily the precedence over the other.

Accepting two notations differing in the associativity to be at the same formal level XX in this model just mean that each level is actually two levels, say XX.5 for right associative and XX.0 for left associative, with XX.5 just above XX.0.

In contrast, incomparability requires a new model where levels are not totally ordered. In the current situation where all notations already have an absolute level rather than a relative level, would we retroactively be able to take advantage of such new model? I mean, is it worth specifically making XX.5 and XX.0 not comparable when all the rest of the hierarchy is already forced to be totally ordered?

Last updated: Oct 13 2024 at 01:02 UTC