Stream: Coq users

Topic: Suffix notation with or without space


view this post on Zulip Pierre Rousselin (Dec 26 2023 at 10:28):

Happy holidays!

I've noticed some strange behaviour while trying to define suffix notations for unary operators.

Notation "A '!'" := (not A) (at level 42).
Context (A : Prop).
Check A !.
Check A!.
(* However, it is always printed as "A !" (with a space). *)

Disable Notation "_ !".

Notation "A 'ᶜ'" := (not A) (at level 42).

Check A .
Fail Check Aᶜ.
(*
  The command has indeed failed with message:
  The reference Aᶜ was not found in the current environment.
*)

So first, I don't understand why Coq treats differently the characters '!' and 'ᶜ'.
Then, how could we tell Coq not to insert a space between a term and one of these suffix unary operators?

view this post on Zulip Théo Winterhalter (Dec 26 2023 at 11:05):

Isn't it because is a letter character so it's not split from words during lexing?

view this post on Zulip Dominique Larchey-Wendling (Dec 26 2023 at 11:07):

Should Coq have catcodes like TeX ?

view this post on Zulip Cyril Cohen (Dec 26 2023 at 11:52):

Sidenote:

Notation "A '!'" := (not A) (format "A '!'", at level 42).

should make sure that no space is printed.

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 12:22):

@Dominique Larchey-Wendling Technically, it seems hard to combine libraries with different catcodes. If you actually used TeX's design, IIUC you'd lose separate compilation — Coq's rules for Require probably fix that, but this doesn't inspire confidence in TeX's design. And "identifiers are made of Unicode letters" probably simplifies reading Coq code for human, compared to tracking mutable state of catcodes in one's head.

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 12:57):

Cyril Cohen said:

Sidenote:

Notation "A '!'" := (not A) (format "A '!'", at level 42).

should make sure that no space is printed.

This is great! Thanks!

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:00):

Théo Winterhalter said:

Isn't it because is a letter character so it's not split from words during lexing?

I guess it's a choice about which characters are "letter characters", then?
Anyway, it's possible in lean, see https://lean-lang.org/logic_and_proof/sets_in_lean.html but maybe there's a trade-off and limitation in which characters are usable for names?

view this post on Zulip James Wood (Dec 26 2023 at 13:12):

The Unicode standards define various properties of each character, like whether the character is a letter, whether it's numerical, whether it's whitespace, whether it's upper case, &c. I'd guess that Coq uses one (or more) of these properties.

view this post on Zulip Dominique Larchey-Wendling (Dec 26 2023 at 13:28):

@Paolo Giarrusso sorry I was joking :smiley:

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 13:30):

I agree with James Wood; moreover, allowing all Unicode letters in identifiers is good to let everybody use names from their languages :grinning:

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 13:33):

IIRC, that's the Java's rationale for the choice, and I buy it, especially for lectures

view this post on Zulip Karl Palmskog (Dec 26 2023 at 13:40):

for a regular PL, Unicode in comments is nice, but allowing crazy characters in identifiers can easily lead to chaos and unmaintainability. One would wish for language implementation people to at least provide a way to enforce a character subset at compile time for key aspects of a codebase

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 13:43):

Coq is indeed in dire need of a linter.

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:45):

I can't find the Lean's rules for identifiers...

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:47):

Lean has a very restrictive (and non-extensible) set of identifier characters which is mainly ascii, greek (except for a few key letters), and mathematical letters and subscripts (not superscripts)

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:48):

def isLetterLike (c : Char) : Bool :=
  (0x3b1   c.val && c.val  0x3c9 && c.val  0x3bb) ||                  -- Lower greek, but lambda
  (0x391   c.val && c.val  0x3A9 && c.val  0x3A0 && c.val  0x3A3) || -- Upper greek, but Pi and Sigma
  (0x3ca   c.val && c.val  0x3fb) ||                                   -- Coptic letters
  (0x1f00  c.val && c.val  0x1ffe) ||                                  -- Polytonic Greek Extended Character Set
  (0x2100  c.val && c.val  0x214f) ||                                  -- Letter like block
  (0x1d49c  c.val && c.val  0x1d59f)                                   -- Latin letters, Script, Double-struck, Fractur

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:48):

For Coq it's here: https://coq.inria.fr/refman/language/core/basic.html#grammar-token-unicode_letter

unicode_letter non-exhaustively includes Latin, Greek, Gothic, Cyrillic, Arabic, Hebrew, Georgian, Hangul, Hiragana and Katakana characters, CJK ideographs, mathematical letter-like symbols and non-breaking space. unicode_id_part non-exhaustively includes symbols for prime letters and subscripts.

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:51):

actually that's just the fancy unicode part of the function, the main part is:

def isIdFirst (c : Char) : Bool :=
  c.isAlpha || c = '_' || isLetterLike c

def isIdRest (c : Char) : Bool :=
  c.isAlphanum || c = '_' || c = '\'' || c == '!' || c == '?' || isLetterLike c || isSubScriptAlnum c

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:51):

! and ? are new as identifier characters in lean 4

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:52):

But then how to denote the factorial?

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:52):

n ! or (n)!

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:53):

so basically the same solution as proposed above for Aᶜ

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:53):

Let's change unicode then!

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:53):

postfix ? comes up a lot when defining parsers, and yeah it kind of sucks

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:54):

We squat a plane with special "mathematical operators not to be used in names in Coq or Lean"

view this post on Zulip Pierre Rousselin (Dec 26 2023 at 13:56):

There's plenty of space! Planes 4 to 13 are not assigned.

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:56):

but now we also have naming conventions regarding functionThatPanics! or functionThatReturnsOption? so we'd have to change the naming convention too to get rid of ? and ! as identifier characters

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 13:57):

I take that as a joke, but people should really mark them up :sweat_smile:.

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 13:57):

And we've seen Lean doesn't care for Unicode classification...

view this post on Zulip Mario Carneiro (Dec 26 2023 at 13:59):

I think it is reasonable to use an actual unicode category rather than some ad hoc thing like lean does as the basis, but you should reserve the right to take specific parts in or out in order to satisfy the more mathematical use cases

view this post on Zulip Mario Carneiro (Dec 26 2023 at 14:00):

I think that "subscripts are identifier characters, superscripts are not" is the sort of distinction that makes sense in mathematics but is not something one would expect from unicode

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 14:00):

To spell out why/when Unicode letters are a good idea (for Java, and Coq): I'd recommend using English for any serious project today. But not all Coq classes will be in English.

view this post on Zulip Mario Carneiro (Dec 26 2023 at 14:01):

With lean it's fine to have identifier names in French, but if it's, say, Russian or Japanese then you will be very sad

view this post on Zulip Mario Carneiro (Dec 26 2023 at 14:02):

there is the workaround that you can make anything an identifier if you put it in «» quotes but this is not user friendly in the slightest

view this post on Zulip Paolo Giarrusso (Dec 26 2023 at 14:04):

To spell out why/when Unicode letters are a good idea: I'd recommend using English for any serious project today, but that's really up to the project lead — and if your lecture isn't in English, probably your code won't be either


Last updated: Jun 13 2024 at 21:01 UTC