I was trying to represent regular expressions over strings as propositions.

Did this:

```
Require Import String.
Open Scope string.
Definition char := string.
Inductive re : Type :=
| Atom : char -> re
| Concat : re -> re -> re
| Alt : re -> re -> re
| Dot : re.
(* Indexing starts from zero. Empty string on invalid index. *)
Definition stridx (i:nat) (s:string) : char :=
String.substring i 1 s.
Definition strdrop (n:nat) (s:string) : string :=
String.substring n ((String.length s)-1) s.
Fixpoint reDenote (r:re) : string -> Prop :=
match r with
| Atom ch => fun s:string => stridx 0 s = ch
| Dot => fun s:string => stridx 0 s = stridx 0 s
| Concat r1 r2 => fun s:string =>
((reDenote r1) s) /\ ((reDenote r2) (strdrop 1 s))
| Alt r1 r2 => fun s:string =>
((reDenote r1) s) \/ ((reDenote r2) s)
end.
```

This looks okay:

```
Infix ":-" := Concat (at level 51, right associativity).
Compute reDenote ((Atom "h") :- (Atom "i") :- (Atom "y") :- (Atom "a")) "hiya".
(*
= "h" = "h" /\ "i" = "i" /\ "y" = "y" /\ "a" = "a"
: Prop
*)
```

But how can we add zero or more number of repetitions? Kleene star thing?

I was thinking to have a constructor like :

```
Star : re -> re
```

But how can we make it a proposition? I mean, it's like a *possible* infinite stream, but needn't be.

Shouldn't the `Inductive re`

contain all building blocks for regex, including star and repetitions?

Yeah, but I was stuck at Star. I got like:

```
Inductive re : Type :=
| Atom : char -> re
| Concat : re -> re -> re
| Alt : re -> re -> re
| Star : re -> re
| Dot : re
| Epsilon : re.
```

But couldn't get the translation to Prop right.

```
Fixpoint reDenote (r:re) : string -> Prop :=
match r with
| Atom ch => fun s:string => stridx 0 s = ch
| Dot => fun s:string => stridx 0 s = stridx 0 s
| Epsilon => fun s:string => True
| Star r => fun s:string =>
reDenote (Alt (Concat r (Star r)) Epsilon) s
| Concat r1 r2 => fun s:string =>
((reDenote r1) s) /\ ((reDenote r2) (strdrop 1 s))
| Alt r1 r2 => fun s:string =>
((reDenote r1) s) \/ ((reDenote r2) s)
end.
```

The argument to fixpoint is not getting smaller for `Star`

.

Error:

```
Error:
Recursive definition of reDenote is ill-formed.
In environment
reDenote : re -> string -> Prop
r : re
r0 : re
s : string
Recursive call to reDenote has principal argument equal to
"Alt (Concat r0 (Star r0)) Epsilon" instead of
"r0".
```

I don't get what this function is supposed to do

Oh, it's supposed to check if the string satisfies the regex?

Then even without star, I don't think that worked : the Concat branch looks highly suspicious.

And by the way, if you're actually computing a truth value, it should definitely give a `bool`

-- `Prop`

is for more abstract matters.

As far as I know (I'm a mathematician, not a computer scientist), the usual way to go with regex is:

- compile the regex to some other form (finite automata) ;
- use the compiled form to check strings.

Julien Puydt said:

Oh, it's supposed to check if the string satisfies the regex?

Yeah, that was what I was after.

Julien Puydt said:

Then even without star, I don't think that worked : the Concat branch looks highly suspicious.

Why? Is it because of the `strdrop`

?

Julien Puydt said:

As far as I know (I'm a mathematician, not a computer scientist), the usual way to go with regex is:

- compile the regex to some other form (finite automata) ;
- use the compiled form to check strings.

If the regex can be made as a `re`

value, it would be like having the automata, right?

The `strdrop`

only lets a single character go... that means you're only able to recognize something trivial first then something else

Sure, you can do something like:

```
Definition re_check (r: re) (s: string) := automata_check (automata_of_re re) s.
```

but you'll still need the automata.

And the reason people don't do it like this is performance: if you want to check many strings with the same regex, you better not recompute the automata each time!

If you want to define a proposition "the string s satisfies the regexp r", you may write an inductive definition, like:

```
Inductive reDenote : re -> string -> Prop :=
| Atom_denote ch : reDenote (Atom ch) (String ch EmptyString)
| Dot_denote : forall ch, reDenote Dot (String ch EmptyString)
| Epsilon_denote : reDenote Epsilon EmptyString
| Star_denote r : forall s,
reDenote (Alt (Concat r (Star r)) Epsilon) s ->
reDenote (Star r) s
| Concat_denote r1 r2: forall s1 s2 s, reDenote r1 s1 ->
reDenote r2 s2 ->
s = append s1 s2 ->
reDenote (Concat r1 r2) s
| Alt_denote r1 r2: forall s, (reDenote r1 s \/ reDenote r2 s) ->
reDenote (Alt r1 r2) s.
```

(to be verified and/or improved)

Then, you may define recognizers for reg-exp (maybe with deterministic automata) and prove they are correct w.r.t. this definition.

Apart from the fixpoint issue, I didn't understand some definitions like `Definition char := string.`

, and the clauses for `Atom`

, `Dot`

, `Epsilon`

and `Concat`

(problems with string length (should be 1 for `Atom`

, `Dot`

, 0 for `Epsilon`

, ... ?)

@Pierre Castéran I don't get how `forall`

appears in this context... I wouldn't have expected universal quantification there, but existential.

Julien Puydt said:

Pierre Castéran I don't get how

`forall`

appears in this context... I wouldn't have expected universal quantification there, but existential.

Which variables do you would existentially quantify? For instance, `s1`

and `s2`

in `Concat_denote`

? I think it would introduce in the definition some more connectives.

Although it doesn't prove the correctness of my definition, here is a small example.

```
Goal reDenote (Concat Dot (Atom one))
(String zero (String one EmptyString)).
econstructor 5.
constructor 2.
constructor 1.
reflexivity.
Qed.
```

Btw, I like the Prolog style too.

```
(** reRest r s s': one can scan a prefix of s which matches r,
and s' is the remaining suffix *)
Inductive reRest : re -> string -> string -> Prop :=
| reRest1 ch : forall s, reRest (Atom ch) (String ch s) s
| reRest2 : forall ch s, reRest Dot (String ch s) s
| reRest3 : forall s, reRest Epsilon s s
| reRest4 r : forall s, reRest (Star r) s s
| reRest5 r : forall s s' s'', reRest r s s' ->
reRest (Star r) s' s'' ->
reRest (Star r) s s''
| reRest6 r1 r2: forall s s1 s2 , reRest r1 s s1 ->
reRest r2 s1 s2 ->
reRest (Concat r1 r2) s s2
| reRest7 r1 r2 : forall s s', reRest r1 s s' ->
reRest (Alt r1 r2) s s'
| reRest8 r1 r2 : forall s s', reRest r2 s s' ->
reRest (Alt r1 r2) s s'.
Definition reDenote' r s := reRest r s EmptyString.
```

Yes, I would ; but I think I wasn't looking at it right : I considered that to prove `reDenote (Concat r1 r2) s`

one needs to prove there exists `s1 s2`

; while what you have expressed is that all those implications work -- you're not forcing to use all of them, any will do! So it's correct and I get your point.

remember the definition of exists:

```
Inductive ex (A:Type) (P:A -> Prop) : Prop :=
ex_intro : forall x:A, P x -> ex A P.
```

ie `forall`

in a constructor type makes the inductive an existential

for validation that your regexp representation is correct, I recommend proving it equivalent to the one here: https://github.com/coq-community/reglang

I was also trying to understand the regular expressions done in https://github.com/coq-contribs/regexp

I tried

```
Compute matches (And (Char "a") (Char "b")) "ab"%string.
```

But that gave `false`

.

I thought `matches`

(type: `RegExp -> String -> bool`

) meant that it would take a regex and string and would say if the regex matched that string or not. Isn't that so?

you're thinking of Cat not And

https://github.com/coq-contribs/regexp/blob/master/Definitions.v#L2-L10

Oh, right. :woozy_face:

Pierre Castéran said:

Apart from the fixpoint issue, I didn't understand some definitions like Definition char := string.

I was trying to have a representation for a single character string (but of course with nothing to stop it from being more than one character). I didn't really know how to do it. I see that Ascii.ascii is better choice.

And in

```
| Atom_denote ch : reDenote (Atom ch) (String ch EmptyString)
| Dot_denote : forall ch, reDenote Dot (String ch EmptyString)
```

Is there any significance in having `ch`

of `Atom_denote`

'inside' when compared to that of `Dot_denote`

where it's 'outside'?

And does having `Concat_denote`

like:

```
| ConcatDe r1 r2: forall s1 s2:string,
reDenote r1 s1 -> reDenote r2 s2
-> reDenote (Concat r1 r2) (append s1 s2)
```

have any disadvantage?

Is there any significance in having ch of Atom_denote 'inside' when compared to that of Dot_denote where it's 'outside'?

absolutely 0 significance, it's just syntax sugar

And does having Concat_denote like:

as opposed to what?

Gaëtan Gilbert said:

And does having Concat_denote like:

as opposed to what?

Instead of

```
| Concat_denote r1 r2: forall s1 s2 s, reDenote r1 s1 ->
reDenote r2 s2 ->
s = append s1 s2 ->
reDenote (Concat r1 r2) s
```

not sure what the difference in practice would be

probably not very impactful?

I got this:

```
Require Import Ascii String.
Inductive re : Set :=
| Atom: Ascii.ascii -> re
| Epsilon: re
| Dot: re
| Concat: re -> re -> re
| Alt: re -> re -> re
| Star: re -> re.
Inductive reDenote : re -> string -> Prop :=
| AtomDe: forall ch:Ascii.ascii,
reDenote (Atom ch) (String ch EmptyString)
| EpsilonDe: reDenote Epsilon EmptyString
| DotDe: forall ch:Ascii.ascii,
reDenote Dot (String ch EmptyString)
| ConcatDe r1 r2: forall s1 s2 s:string,
reDenote r1 s1 -> reDenote r2 s2 -> s = append s1 s2
-> reDenote (Concat r1 r2) s
| AltDe r1 r2: forall s:string,
reDenote r1 s \/ reDenote r2 s
-> reDenote (Alt r1 r2) s
| StarDe r: forall s:string,
reDenote (Alt (Concat r (Star r)) Epsilon) s
-> reDenote (Star r) s.
```

I had been trying to prove that the string `"a"`

matches the regex `"a*"`

with:

```
Goal
reDenote (Star (Atom "a"%char)) "a"%string.
Proof.
refine (StarDe _ _ _).
refine (AltDe _ _ _ _).
left.
refine (ConcatDe _ _ _ _ _ _ _ _).
```

But at this point, I got a variable named `?Goal`

:

```
3 focused subgoals
(shelved: 2) (ID 57)
============================
reDenote (Atom "a") ?Goal
subgoal 2 (ID 59) is:
reDenote (Star (Atom "a")) ?Goal0
subgoal 3 (ID 61) is:
"a"%string = (?Goal ++ ?Goal0)%string
```

What does that mean? That the goal itself is still unknown?

it's an existential variable https://coq.github.io/doc/master/refman/language/extensions/evars.html#automatic-resolution-of-existential-variables

Okay, I could complete the proof like this:

```
Goal
reDenote (Star (Atom "a"%char)) "a"%string.
Proof.
refine (StarDe _ _ _).
refine (AltDe _ _ _ _).
left.
refine (ConcatDe _ _ _ _ _ _ _ _).
- exact (AtomDe "a").
- refine (StarDe _ _ _).
refine (AltDe _ Epsilon ""%string _).
right.
exact EpsilonDe.
- reflexivity.
Qed.
```

What about starting with `eapply StarDe`

?

In Prolog style:

```
(** R r s s': one can scan a prefix of s which matches r,
and s' is the remaining suffix *)
Inductive R : re -> string -> string -> Prop :=
| Rat ch : forall s, R (Atom ch) (String ch s) s
| Rdot : forall ch s, R Dot (String ch s) s
| Reps : forall s, R Epsilon s s
| Rstar1 r : forall s s' s'', R r s s' ->
R (Star r) s' s'' ->
R (Star r) s s''
| Rstar0 r : forall s, R (Star r) s s
| Rconcat r1 r2: forall s s1 s2 , R r1 s s1 ->
R r2 s1 s2 ->
R (Concat r1 r2) s s2
| Ralt1 r1 r2 : forall s s', R r1 s s' ->
R (Alt r1 r2) s s'
| Ralt2 r1 r2 : forall s s', R r2 s s' ->
R (Alt r1 r2) s s'.
Definition reDenote r s := R r s EmptyString.
Goal reDenote (Star (Atom "a"%char)) "a"%string.
eapply Rstar1.
apply Rat.
apply Rstar0.
Qed.
Goal reDenote (Concat (Star (Atom "1"%char)) (Atom "0"%char)) "110"%string .
econstructor 6.
econstructor 4.
constructor 1.
econstructor 4.
constructor 1.
constructor 5.
constructor 1.
Qed.
```

Paolo Giarrusso said:

What about starting with

`eapply StarDe`

?

Using `eapply`

certainly feels better, thanks.

```
Goal
reDenote (Star (Atom "a"%char)) "a"%string.
Proof.
eapply StarDe.
eapply AltDe.
left.
eapply ConcatDe.
- eapply AtomDe.
- eapply StarDe.
eapply AltDe.
right.
eapply EpsilonDe.
- reflexivity.
Qed.
```

Pierre Castéran said:

In Prolog style:

`Goal reDenote (Concat (Star (Atom "1"%char)) (Atom "0"%char)) "110"%string . econstructor 6. econstructor 4. constructor 1. econstructor 4. constructor 1. constructor 5. constructor 1. Qed.`

Is there a way to use constructor names instead of their number? Or is the number more convenient for ltac scripts?

Yes, you can type `eapply Rconcat`

instead of `econstructor 6`

, or even design tactics for moving along the regular expression and the string.

I posted this definition because it shows an interesting way of using existential variables.

This technique is for instance illustrated in https://www.cs.unm.edu/~luger/ai-final2/CH8_Natural%20Language%20Processing%20in%20Prolog.pdf

If I remember well, this way of language processing was at the origin of the creation of the Prolog programming language by Alain Colmerauer.

Last updated: Jun 18 2024 at 23:01 UTC