Stream: Coq devs & plugin devs

Topic: Poll on non-associative notations


view this post on Zulip Hugo Herbelin (Jul 12 2023 at 18:02):

Currently, the "no associativity" modifier for notations applies only to printing. For parsing, notations declared non associative are treated as left associative. Example:

Require Import Nat.
Infix "mod" := modulo (at level 40, no associativity).
Check 8 mod 5 mod 2.
(* (8 mod 5) mod 2 *)

How much:

  1. would it be worth or useful to add support for non associativity in parsing
  2. vs fully renouncing to the no associativity modifier

The risk with 1. is that notations currently declared non associative often already exploit that they are actually parsed left associative, so this may lead to compatibility issues.
What would be lost to adopt 2.?
Really supporting "no associativity" would a priori be feasible in Camlp5, but first wondering if there are opinions around on this question.

view this post on Zulip Gaëtan Gilbert (Jul 12 2023 at 18:28):

vs fully renouncing to the no associativity modifier

what does that mean?

view this post on Zulip Pierre Roux (Jul 15 2023 at 09:20):

See in https://github.com/coq/coq/issues/17859 for the detailled explanation.

view this post on Zulip Pierre Roux (Jul 15 2023 at 09:24):

My opinion: I would try to implement non associativity in parsing (if easy enough) and assess it on CI: if it appears that most use there intended left associativity then we could just drop no associativity, otherwise keep the implementation, in both cases with a deprecation phase.

view this post on Zulip Hugo Herbelin (Jul 15 2023 at 12:11):

Well, I finally gave it a try yesterday and it seems to work. I can compile the stdlib with both non-associativity respected and the "recovery" mode of camlp5 deactivated. I don't know exactly how to proceed though.
The changes needed in the stdlib are:

And, surprisingly, that's it!

The issue with (*) is now that the printer really needs to be able to support cycles (for binder_constr). So, this would probably be the next step to do.

Maybe I could also just open a PR with the extended gramlib, even if the non-associativity and non-recovery features are left inactive until a decision is taken on the strategy?

view this post on Zulip Jason Gross (Jul 17 2023 at 00:15):

add parentheses in many situations of the form x =? y = true, because =? was declared non associative, so there is a design choice to do here about this family of notations

Why does this need to change?

view this post on Zulip Pierre Roux (Jul 17 2023 at 06:39):

Currently non associativity is used only for printing (adding extra parentheses) but synonymous to left associativity for parsing. We may want to actually enforce non associativity for parsing (e.g. reject x =? y = true and only accept (x =? y) = true).

view this post on Zulip Pierre Roux (Jul 17 2023 at 11:00):

Hugo Herbelin said:

Well, I finally gave it a try yesterday and it seems to work. I can compile the stdlib with both non-associativity respected and the "recovery" mode of camlp5 deactivated.

Impressive!

Hugo Herbelin said:

The changes needed in the stdlib are:

That might be the most impactful change, maybe worth opening a draft PR to run the CI and see how it fares.

Hugo Herbelin said:

Seems to be the whole point of enforcing non associativity, so makes perfect sense to me.

Hugo Herbelin said:

Shouldn't we first enable mixing associativity at the same level, as you suggesst in https://github.com/coq/coq/issues/17859 wouldn't that remove the need for this change?

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 11:24):

That might be the most impactful change

Actually no, because what is in binder_constr already spans until the end of the term, and we keep it spanning by setting its right-hand side at level 200. In the other direction, the recovery of camlp5 makes that the binder_constr, even if at 200, are also parser at lower level (as in True /\ forall x, x = 0). So, the only impact is on printing which is more complicated (i.e., currently, True /\ forall x, x = 0is printed `True /\ (forall x, x = 0) and one would have to drop the parentheses to be consistent with parsing).

Incidentally, I said moving it to 0, but I'm now leaning to moving it to 10 (or even maybe something intermediary between 10 and 70), because supporting e.g list forall x, x = 0 does not seem necessary (it is even urrently not accepted).

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 11:26):

Shouldn't we first enable mixing associativity at the same level

Probably. The flexibility it would give seems worth, even if it seems a rather big change of model.

view this post on Zulip Pierre Roux (Jul 17 2023 at 11:30):

Hugo Herbelin said:

Incidentally, I said moving it to 0, but I'm now leaning to moving it to 10 (or even maybe something intermediary between 10 and 70), because supporting e.g list forall x, x = 0 does not seem necessary (it is even urrently not accepted).

I'm not even sure this is desirable (I find it pretty unreadable).

view this post on Zulip Gaëtan Gilbert (Jul 17 2023 at 11:31):

Is Notation "x + y" := whatever (no associativity) currently basically equivalent to

Notation "x + y" := whatever (left associativity, only parsing).
Notation "x + y" := (x + y) (no associativity, only printing).

with the later being forward compatible with respecting no associativity in parsing?

view this post on Zulip Guillaume Melquiond (Jul 17 2023 at 11:41):

I don't understand what mixing associativity means. If * (left assoc) and ^ (right assoc) were at the same level, how would you ever parse x * y ^ z and x ^ y * z? Actually, is there even an existing parser generator that allows it? (I am pretty sure both Bison and Menhir forbid it.)

view this post on Zulip Pierre Roux (Jul 17 2023 at 11:47):

C.f. https://github.com/coq/coq/issues/17859 basically, the left associativity gets precedence and your examples are parsed as (x * y) ^ z and x ^ (y * z). It's similar to having two sublevels for left and right priority, left being just below right.

view this post on Zulip Guillaume Melquiond (Jul 17 2023 at 11:53):

Accepting these expressions seems like a footgun. It would be better to choke on them, user-wise. I don't know if it is easily implemented, though. But if it is possible to choke on nonassociative operators, it should be possible to choke on operators changing associativity on the fly.

view this post on Zulip Pierre Roux (Jul 17 2023 at 12:01):

That's what is done currently: once some notation decides the associativity of a level, declaring another associativity in another notation command (or requiring a module with that command) is an error.

view this post on Zulip Pierre Roux (Jul 17 2023 at 12:02):

Even if the two notations come from diferrent libraries and operate on types that are so diferent that they will never be mixed.

view this post on Zulip Guillaume Melquiond (Jul 17 2023 at 12:07):

Just to be clear, I know what is done currently. What I mean is that if we ever allow two associativity directions at the same level, we should make sure that mixing them fails at parsing time, rather than implicitly parenthesizing the expression one way rather than the other.

view this post on Zulip Pierre Roux (Jul 17 2023 at 12:17):

I agree failing at parsing time would be nice.

view this post on Zulip Gaëtan Gilbert (Jul 17 2023 at 12:21):

It's similar to having two sublevels for left and right priority, left being just below right.

Instead of having left below right, could they be incomparable? so if we start to parse in one we can't go into the other

view this post on Zulip Pierre Roux (Jul 17 2023 at 12:42):

That's what Guillaume suggest. I don't know how easy it would be to implement (probably not too hard but I haven't tried) but this would indeed be the best solution.

view this post on Zulip Michael Soegtrop (Jul 17 2023 at 13:11):

I would prefer to have "non-associative" to be also "non-associative" on parsing. It is important that people understand specs correctly when reading them and one can use non-associative notations to ensure readability if this is enforced. E.g. many C compilers now give warnings if parenthesis are not given in cases which are not obvious (to everyone) without them.

view this post on Zulip Jason Gross (Jul 17 2023 at 16:01):

Pierre Roux said:

Currently non associativity is used only for printing (adding extra parentheses) but synonymous to left associativity for parsing. We may want to actually enforce non associativity for parsing (e.g. reject x =? y = true and only accept (x =? y) = true).

This point still confuses me. I generally set notations to be "no associativity" when they cannot be nested. e.g., neither (x =? y) =? z nor x =? (y =? z) will typecheck. But (x =? y) = z does typecheck, so I'd want the associativity of _ = _ to dominate here. Does this not happen for some reason?

(Moreover, I wouldn't be surprised if many of the "no associativity" specifiers followed this logic. It is not the case for _ mod _, which I guess is not associative for clarity reasons, but for any boolean infix operators declared non associative this might explain the reasoning.)

view this post on Zulip Pierre Roux (Jul 17 2023 at 16:16):

In the stdlib, "_ = _" and "_ =? _" are both declared at level 70, no associativity. Since for parsing this amount to left associativity, this means that x =? y = z will be parsed as (x =? y) = z and x = y =? z as (x = y) =? z. For =? to have a higher priority than = it should be at a level below 70 (I don't know why this is not the case).

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:38):

@Guillaume Melquiond: The situation is more subtle than we might think. (At least, I realize now that I've been led into error by camlp5 for a long time!)

Here is detailed analysis whose summary, TL;DR, is:

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:39):

More details:

In the ideal world, we have a stratified grammar where for all op in the level, a level has one of these forms

In practise, Camlp5 grammars are both more constrained:

but also more liberal:

and, additionally, do not respect the specification:

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:40):

So, the situation is currently that

  1. Camlp5 does not ensure by default that the user will follow this model, nor even ensures the specification if ever the model were followed,
  2. Coq does not (fully) check either that the user follow this model:

    - Coq accepts that one development sets level XX right associative and another sets it left associative, resulting in conflicts,
    - Coq accepts that the user gives rules with an arbitrary termXX on the right-hand side, which is liable to create circles (as in the case of rewstrategy, see PR#17832).

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:40):

There are yet additional strange things: SELF on the rhs of a left associative level means the next level but SELF on the lhs of a right associative level XX means the subset of rules of XX starting (morally) with a terminal!! So, a non-intuitive loss of symmetry as visible in the current example when recovery is removed:

Notation "x +++ y" := (x+y) (at level 23, left associativity).
Notation "!!" := 0 (at level 23).
Check !! +++ 0. (* allowed *)

but

Notation "x +++ y" := (x+y) (at level 23, right associativity).
Notation "!!" := 0 (at level 23).
Check 0 +++ !!. (* disallowed *)

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:40):

For a couple of years now, I experimented here and there to change camlp5 so that:

  1. it satisfies the specification (no recovery mechanism, as in True /\ forall x, x = 0 which camlp5 accepts even though forall is at level 200 > 85 which is the level of \/),
  2. it satisfies no-associativity (which unfortunately I only realized late that it is not implemented nor simulatable).

In particular, I experimented to progressively move the "ideal model", realizing that moving to it breaks things:

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:41):

Now, the advantages of rethinking each level as being 4 levels XX_infix_postfix_right_assoc, XX_infix_prefix_left_assoc, XX_infix_postfix_non_assoc and XX_prefix_non_assoc_closed are:

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:46):


@Gaëtan Gilbert :

Instead of having left below right, could they be incomparable?

I would know how to do that with distinct grammars, but I don't know how to do it using levels: the left-hand side of rules using levels has to be SELF, otherwise it loops.

But why not, nothing forces us per se to use camlp5 levels. We can also emulate them as we like.

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:54):


Regarding x =? y, one issue is that there are two antagonistic views:

view this post on Zulip Hugo Herbelin (Jul 17 2023 at 18:56):

The argument of clarity also applies: x =? y = z =? t does not seem so obvious to parse correctly for a human, without parentheses.

view this post on Zulip Hugo Herbelin (Jul 20 2023 at 16:13):

@Gaëtan Gilbert:

Instead of having left below right, could they be incomparable?

I would know how to do that with distinct grammars

Actually, that does not look so easy to do. To turn incomparability of a left-assoc op and a right-assoc op at the same level into an error, as said by @Guillaume Melquiond, I can imagine the following but this is a bit technical:

term42:
 [ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
 | x = term41; "opL"; f = term42_no_right -> f x
 | x = term41; "opN"; f = term41 -> OpN(x,y)
 | x = term41 -> x ]
term42_no_left:
 [ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
 | x = term41 -> x ]
term42_no_right:
[ y = term41; "opL"; f = term42_no_right -> fun x => f (OpL (x,y))
| y = term41 -> fun x -> OpL(x,y) ]

So, ordering a left-assoc op and a right-assoc op at the same level would be easier, as in:

term42:
 [ x = SELF; "opR"; y = term42 -> OpR (x,y)
 | x = SELF; "opL"; y = term41 -> OpL(x,y)
 | x = term41 -> x ]

where, by construction of camlp5 LL parser, opR binds wider than opL.

view this post on Zulip Hugo Herbelin (Jul 20 2023 at 16:25):

And to support opN yet again below opL, the following should work:

term42: [
 [ x = SELF; "opR"; y = term42 -> OpR (x,y)
 | x = SELF; "opL"; y = term41 -> OpL(x,y)
 | x = term42_no -> x ] ]
term42_no: [ NONA
 [ x = term41; "opN"; y = term41 -> OpN (x,y)
 | x = term41 -> x ] ]

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 06:27):

I am bit confused. Why is it SELF and not term41 on the left of opR in term42? Is it a peculiarity of camlp5? (Perhaps that is what you mean by 'by construction of camlp5 LL parser, opR binds wider than opL".)

view this post on Zulip Pierre Roux (Jul 21 2023 at 08:39):

In the first case it's the same but in the second case it is term42_no instead of term41, allowing "opN yet again below opL".
Also, with other rules, it could be different according to https://github.com/coq/coq/issues/17859 that states " SELF on the left is not referring to the lower NEXT level but instead to the subset of the current level not starting with SELF!"

view this post on Zulip Pierre Roux (Jul 21 2023 at 09:04):

Hugo Herbelin said:

To turn incomparability of a left-assoc op and a right-assoc op at the same level into an error, as said by Guillaume Melquiond, I can imagine the following but this is a bit technical:

It indeed is a bit more technical but doesn't multiply the size of the camlp5 grammar by more than two and IINM camlp5 no longer needs to call continuations more than once, so we get noassoc "for free". It seems to behave as the majority of people here expect, have you tried implementing it?

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 09:27):

Pierre Roux said:

" SELF on the left is not referring to the lower NEXT level but instead to the subset of the current level not starting with SELF!"

Okay. So, that explains why term42 := SELF opR term42 makes opR right-associative. But now I am even more confused because, by the same reasoning, term42 := SELF opL term41 should make opL non-associative. I don't see how the recovery mode can make it left-associative (at worst it would make it right-associative).

view this post on Zulip Pierre Roux (Jul 21 2023 at 09:38):

This is explained in https://github.com/coq/coq/issues/17859, presented with term41_x opL term41_y opL term41_z, it parses term41_x then call the continuation ("opL"; y = term41) andb builds opL(term41_x, term41_y) then calls the continuation again finally parsing opL(opL(term41_x, term41_y), term41_z)

view this post on Zulip Pierre Roux (Jul 21 2023 at 12:12):

@Hugo Herbelin In fact, I don't get how your technical solution would work with multiple left associative operators at the same level

term42:
  [ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
  | x = term41; "opL"; f = term42_no_right -> f x
  | x = term41; "opL'"; f = term42_no_right -> f x
  | x = term41; "opN"; f = term41 -> OpN(x,y)
  | x = term41 -> x ]
term42_no_left:
  [ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
  | x = term41 -> x ]
term42_no_right:
  [ y = term41; "opL"; f = term42_no_right -> fun x => f (OpL (x,y)) (* but could also be OpL' *)
  | y = term41; "opL'"; f = term42_no_right -> fun x => f (OpL' (x,y))  (* but could also be OpL *)
  | y = term41 -> fun x -> OpL(x,y) ]

maybe we need some kind of continuation passing style

term42:
  [ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
  | x = term41; "opL"; k = term42_no_right -> k (fun y => OpL (x,y))
  | x = term41; "opL'"; k = term42_no_right -> k (fun y => OpL' (x,y))
  | x = term41; "opN"; f = term41 -> OpN(x,y)
  | x = term41 -> x ]
term42_no_left:
  [ x = term41; "opR"; y = term42_no_left -> OpR (x,y)
  | x = term41 -> x ]
term42_no_right:
  [ x = term41; "opL"; k = term42_no_right -> fun f => let x = f x in k (fun y => OpL (x, y))
  | x = term41; "opL'"; k = term42_no_right -> fun f => let x = f x in k (fun y => OpL' (x, y))
  | x = term41 -> fun f => f x ]

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 13:12):

Pierre Roux said:

This is explained in https://github.com/coq/coq/issues/17859, presented with term41_x opL term41_y opL term41_z, it parses term41_x then call the continuation ("opL"; y = term41) andb builds opL(term41_x, term41_y) then calls the continuation again finally parsing opL(opL(term41_x, term41_y), term41_z)

I agree with this part, which is the traditional way of breaking left recursion in a LL(1) parser. My confusion stems more from the fact that camlp5 seems to give a different meaning to SELF in the rules SELF opR term42 and SELF opL term41. But perhaps, that is just a peculiarity of camlp5, which is giving a different meaning to SELF depending on the level of the rightmost symbol.

view this post on Zulip Pierre Roux (Jul 21 2023 at 13:23):

Which different meaning? My undestanding is that both SELF mean "subset of the current level not starting with SELF" (so in both cases the single rule x = term41).

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 13:33):

Pierre Roux said:

maybe we need some kind of continuation passing style

If you are breaking the left-recursion by hand, shouldn't it be instead just the following?

term42:
  [ x = term41; y = term42_no_left -> y x
  | x = term41; y = term42_no_right -> y x
  | x = term41; "opN"; y = term41 -> OpN(x,y)
  | x = term41 -> x ]
term42_no_left:
  [ "opR";  x' = term41; y = term42_no_left -> fun x => OpR (x, y x') ]
term42_no_right:
  [ "opL";  x' = term41; y = term42_no_right -> fun x => y (opL  (x, x'))
  | "opL'"; x' = term41; y = term42_no_right -> fun x => y (opL' (x, x')) ]

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 13:37):

Each operator appears only once, so no explosion of the grammar, and it makes it obvious that it is indeed LL(1).

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 13:39):

(If we can no longer trust camlp5 to break the left-recursion on its own, we might just as well go all the way to pure LL(1).)

view this post on Zulip Guillaume Melquiond (Jul 21 2023 at 13:48):

(To be clear, pure LL(1) would factor all the rules of term42 into a single rule x = term41; y = term42_suffix -> y x. But camlp5 already performs left-factorization for free.)

view this post on Zulip Hugo Herbelin (Jul 21 2023 at 15:39):

IINM camlp5 no longer needs to call continuations more than once, so we get noassoc "for free".

It is my understaning too.

It seems to behave as the majority of people here expect, have you tried implementing it?

I did not.

Another con to this solution is that Print Grammar constr would be more complex to decode for the user (or Coq would have to provide its own parser).

Another pro, since it relies on less assumptions of the underlying engine, is that it would be (suspectingly) easier to experiment different underlying parsing engines (e.g. a GLR one).

view this post on Zulip Hugo Herbelin (Jul 22 2023 at 06:50):

Ensuring incomparability of some notations is actually a feature. I mean: in the current model, the order on symbols is total, that is, when parsing any two overlapping rules for a symbol, one rule has necessarily the precedence over the other.

Accepting two notations differing in the associativity to be at the same formal level XX in this model just mean that each level is actually two levels, say XX.5 for right associative and XX.0 for left associative, with XX.5 just above XX.0.

In contrast, incomparability requires a new model where levels are not totally ordered. In the current situation where all notations already have an absolute level rather than a relative level, would we retroactively be able to take advantage of such new model? I mean, is it worth specifically making XX.5 and XX.0 not comparable when all the rest of the hierarchy is already forced to be totally ordered?


Last updated: Jun 23 2024 at 01:02 UTC