I was trying to represent regular expressions over strings as propositions.
Did this:
Require Import String.
Open Scope string.
Definition char := string.
Inductive re : Type :=
| Atom : char -> re
| Concat : re -> re -> re
| Alt : re -> re -> re
| Dot : re.
(* Indexing starts from zero. Empty string on invalid index. *)
Definition stridx (i:nat) (s:string) : char :=
String.substring i 1 s.
Definition strdrop (n:nat) (s:string) : string :=
String.substring n ((String.length s)-1) s.
Fixpoint reDenote (r:re) : string -> Prop :=
match r with
| Atom ch => fun s:string => stridx 0 s = ch
| Dot => fun s:string => stridx 0 s = stridx 0 s
| Concat r1 r2 => fun s:string =>
((reDenote r1) s) /\ ((reDenote r2) (strdrop 1 s))
| Alt r1 r2 => fun s:string =>
((reDenote r1) s) \/ ((reDenote r2) s)
end.
This looks okay:
Infix ":-" := Concat (at level 51, right associativity).
Compute reDenote ((Atom "h") :- (Atom "i") :- (Atom "y") :- (Atom "a")) "hiya".
(*
= "h" = "h" /\ "i" = "i" /\ "y" = "y" /\ "a" = "a"
: Prop
*)
But how can we add zero or more number of repetitions? Kleene star thing?
I was thinking to have a constructor like :
Star : re -> re
But how can we make it a proposition? I mean, it's like a possible infinite stream, but needn't be.
Shouldn't the Inductive re
contain all building blocks for regex, including star and repetitions?
Yeah, but I was stuck at Star. I got like:
Inductive re : Type :=
| Atom : char -> re
| Concat : re -> re -> re
| Alt : re -> re -> re
| Star : re -> re
| Dot : re
| Epsilon : re.
But couldn't get the translation to Prop right.
Fixpoint reDenote (r:re) : string -> Prop :=
match r with
| Atom ch => fun s:string => stridx 0 s = ch
| Dot => fun s:string => stridx 0 s = stridx 0 s
| Epsilon => fun s:string => True
| Star r => fun s:string =>
reDenote (Alt (Concat r (Star r)) Epsilon) s
| Concat r1 r2 => fun s:string =>
((reDenote r1) s) /\ ((reDenote r2) (strdrop 1 s))
| Alt r1 r2 => fun s:string =>
((reDenote r1) s) \/ ((reDenote r2) s)
end.
The argument to fixpoint is not getting smaller for Star
.
Error:
Error:
Recursive definition of reDenote is ill-formed.
In environment
reDenote : re -> string -> Prop
r : re
r0 : re
s : string
Recursive call to reDenote has principal argument equal to
"Alt (Concat r0 (Star r0)) Epsilon" instead of
"r0".
I don't get what this function is supposed to do
Oh, it's supposed to check if the string satisfies the regex?
Then even without star, I don't think that worked : the Concat branch looks highly suspicious.
And by the way, if you're actually computing a truth value, it should definitely give a bool
-- Prop
is for more abstract matters.
As far as I know (I'm a mathematician, not a computer scientist), the usual way to go with regex is:
Julien Puydt said:
Oh, it's supposed to check if the string satisfies the regex?
Yeah, that was what I was after.
Julien Puydt said:
Then even without star, I don't think that worked : the Concat branch looks highly suspicious.
Why? Is it because of the strdrop
?
Julien Puydt said:
As far as I know (I'm a mathematician, not a computer scientist), the usual way to go with regex is:
- compile the regex to some other form (finite automata) ;
- use the compiled form to check strings.
If the regex can be made as a re
value, it would be like having the automata, right?
The strdrop
only lets a single character go... that means you're only able to recognize something trivial first then something else
Sure, you can do something like:
Definition re_check (r: re) (s: string) := automata_check (automata_of_re re) s.
but you'll still need the automata.
And the reason people don't do it like this is performance: if you want to check many strings with the same regex, you better not recompute the automata each time!
If you want to define a proposition "the string s satisfies the regexp r", you may write an inductive definition, like:
Inductive reDenote : re -> string -> Prop :=
| Atom_denote ch : reDenote (Atom ch) (String ch EmptyString)
| Dot_denote : forall ch, reDenote Dot (String ch EmptyString)
| Epsilon_denote : reDenote Epsilon EmptyString
| Star_denote r : forall s,
reDenote (Alt (Concat r (Star r)) Epsilon) s ->
reDenote (Star r) s
| Concat_denote r1 r2: forall s1 s2 s, reDenote r1 s1 ->
reDenote r2 s2 ->
s = append s1 s2 ->
reDenote (Concat r1 r2) s
| Alt_denote r1 r2: forall s, (reDenote r1 s \/ reDenote r2 s) ->
reDenote (Alt r1 r2) s.
(to be verified and/or improved)
Then, you may define recognizers for reg-exp (maybe with deterministic automata) and prove they are correct w.r.t. this definition.
Apart from the fixpoint issue, I didn't understand some definitions like Definition char := string.
, and the clauses for Atom
, Dot
, Epsilon
and Concat
(problems with string length (should be 1 for Atom
, Dot
, 0 for Epsilon
, ... ?)
@Pierre Castéran I don't get how forall
appears in this context... I wouldn't have expected universal quantification there, but existential.
Julien Puydt said:
Pierre Castéran I don't get how
forall
appears in this context... I wouldn't have expected universal quantification there, but existential.
Which variables do you would existentially quantify? For instance, s1
and s2
in Concat_denote
? I think it would introduce in the definition some more connectives.
Although it doesn't prove the correctness of my definition, here is a small example.
Goal reDenote (Concat Dot (Atom one))
(String zero (String one EmptyString)).
econstructor 5.
constructor 2.
constructor 1.
reflexivity.
Qed.
Btw, I like the Prolog style too.
(** reRest r s s': one can scan a prefix of s which matches r,
and s' is the remaining suffix *)
Inductive reRest : re -> string -> string -> Prop :=
| reRest1 ch : forall s, reRest (Atom ch) (String ch s) s
| reRest2 : forall ch s, reRest Dot (String ch s) s
| reRest3 : forall s, reRest Epsilon s s
| reRest4 r : forall s, reRest (Star r) s s
| reRest5 r : forall s s' s'', reRest r s s' ->
reRest (Star r) s' s'' ->
reRest (Star r) s s''
| reRest6 r1 r2: forall s s1 s2 , reRest r1 s s1 ->
reRest r2 s1 s2 ->
reRest (Concat r1 r2) s s2
| reRest7 r1 r2 : forall s s', reRest r1 s s' ->
reRest (Alt r1 r2) s s'
| reRest8 r1 r2 : forall s s', reRest r2 s s' ->
reRest (Alt r1 r2) s s'.
Definition reDenote' r s := reRest r s EmptyString.
Yes, I would ; but I think I wasn't looking at it right : I considered that to prove reDenote (Concat r1 r2) s
one needs to prove there exists s1 s2
; while what you have expressed is that all those implications work -- you're not forcing to use all of them, any will do! So it's correct and I get your point.
remember the definition of exists:
Inductive ex (A:Type) (P:A -> Prop) : Prop :=
ex_intro : forall x:A, P x -> ex A P.
ie forall
in a constructor type makes the inductive an existential
for validation that your regexp representation is correct, I recommend proving it equivalent to the one here: https://github.com/coq-community/reglang
I was also trying to understand the regular expressions done in https://github.com/coq-contribs/regexp
I tried
Compute matches (And (Char "a") (Char "b")) "ab"%string.
But that gave false
.
I thought matches
(type: RegExp -> String -> bool
) meant that it would take a regex and string and would say if the regex matched that string or not. Isn't that so?
you're thinking of Cat not And
https://github.com/coq-contribs/regexp/blob/master/Definitions.v#L2-L10
Oh, right. :woozy_face:
Pierre Castéran said:
Apart from the fixpoint issue, I didn't understand some definitions like Definition char := string.
I was trying to have a representation for a single character string (but of course with nothing to stop it from being more than one character). I didn't really know how to do it. I see that Ascii.ascii is better choice.
And in
| Atom_denote ch : reDenote (Atom ch) (String ch EmptyString)
| Dot_denote : forall ch, reDenote Dot (String ch EmptyString)
Is there any significance in having ch
of Atom_denote
'inside' when compared to that of Dot_denote
where it's 'outside'?
And does having Concat_denote
like:
| ConcatDe r1 r2: forall s1 s2:string,
reDenote r1 s1 -> reDenote r2 s2
-> reDenote (Concat r1 r2) (append s1 s2)
have any disadvantage?
Is there any significance in having ch of Atom_denote 'inside' when compared to that of Dot_denote where it's 'outside'?
absolutely 0 significance, it's just syntax sugar
And does having Concat_denote like:
as opposed to what?
Gaëtan Gilbert said:
And does having Concat_denote like:
as opposed to what?
Instead of
| Concat_denote r1 r2: forall s1 s2 s, reDenote r1 s1 ->
reDenote r2 s2 ->
s = append s1 s2 ->
reDenote (Concat r1 r2) s
not sure what the difference in practice would be
probably not very impactful?
I got this:
Require Import Ascii String.
Inductive re : Set :=
| Atom: Ascii.ascii -> re
| Epsilon: re
| Dot: re
| Concat: re -> re -> re
| Alt: re -> re -> re
| Star: re -> re.
Inductive reDenote : re -> string -> Prop :=
| AtomDe: forall ch:Ascii.ascii,
reDenote (Atom ch) (String ch EmptyString)
| EpsilonDe: reDenote Epsilon EmptyString
| DotDe: forall ch:Ascii.ascii,
reDenote Dot (String ch EmptyString)
| ConcatDe r1 r2: forall s1 s2 s:string,
reDenote r1 s1 -> reDenote r2 s2 -> s = append s1 s2
-> reDenote (Concat r1 r2) s
| AltDe r1 r2: forall s:string,
reDenote r1 s \/ reDenote r2 s
-> reDenote (Alt r1 r2) s
| StarDe r: forall s:string,
reDenote (Alt (Concat r (Star r)) Epsilon) s
-> reDenote (Star r) s.
I had been trying to prove that the string "a"
matches the regex "a*"
with:
Goal
reDenote (Star (Atom "a"%char)) "a"%string.
Proof.
refine (StarDe _ _ _).
refine (AltDe _ _ _ _).
left.
refine (ConcatDe _ _ _ _ _ _ _ _).
But at this point, I got a variable named ?Goal
:
3 focused subgoals
(shelved: 2) (ID 57)
============================
reDenote (Atom "a") ?Goal
subgoal 2 (ID 59) is:
reDenote (Star (Atom "a")) ?Goal0
subgoal 3 (ID 61) is:
"a"%string = (?Goal ++ ?Goal0)%string
What does that mean? That the goal itself is still unknown?
it's an existential variable https://coq.github.io/doc/master/refman/language/extensions/evars.html#automatic-resolution-of-existential-variables
Okay, I could complete the proof like this:
Goal
reDenote (Star (Atom "a"%char)) "a"%string.
Proof.
refine (StarDe _ _ _).
refine (AltDe _ _ _ _).
left.
refine (ConcatDe _ _ _ _ _ _ _ _).
- exact (AtomDe "a").
- refine (StarDe _ _ _).
refine (AltDe _ Epsilon ""%string _).
right.
exact EpsilonDe.
- reflexivity.
Qed.
What about starting with eapply StarDe
?
In Prolog style:
(** R r s s': one can scan a prefix of s which matches r,
and s' is the remaining suffix *)
Inductive R : re -> string -> string -> Prop :=
| Rat ch : forall s, R (Atom ch) (String ch s) s
| Rdot : forall ch s, R Dot (String ch s) s
| Reps : forall s, R Epsilon s s
| Rstar1 r : forall s s' s'', R r s s' ->
R (Star r) s' s'' ->
R (Star r) s s''
| Rstar0 r : forall s, R (Star r) s s
| Rconcat r1 r2: forall s s1 s2 , R r1 s s1 ->
R r2 s1 s2 ->
R (Concat r1 r2) s s2
| Ralt1 r1 r2 : forall s s', R r1 s s' ->
R (Alt r1 r2) s s'
| Ralt2 r1 r2 : forall s s', R r2 s s' ->
R (Alt r1 r2) s s'.
Definition reDenote r s := R r s EmptyString.
Goal reDenote (Star (Atom "a"%char)) "a"%string.
eapply Rstar1.
apply Rat.
apply Rstar0.
Qed.
Goal reDenote (Concat (Star (Atom "1"%char)) (Atom "0"%char)) "110"%string .
econstructor 6.
econstructor 4.
constructor 1.
econstructor 4.
constructor 1.
constructor 5.
constructor 1.
Qed.
Paolo Giarrusso said:
What about starting with
eapply StarDe
?
Using eapply
certainly feels better, thanks.
Goal
reDenote (Star (Atom "a"%char)) "a"%string.
Proof.
eapply StarDe.
eapply AltDe.
left.
eapply ConcatDe.
- eapply AtomDe.
- eapply StarDe.
eapply AltDe.
right.
eapply EpsilonDe.
- reflexivity.
Qed.
Pierre Castéran said:
In Prolog style:
Goal reDenote (Concat (Star (Atom "1"%char)) (Atom "0"%char)) "110"%string . econstructor 6. econstructor 4. constructor 1. econstructor 4. constructor 1. constructor 5. constructor 1. Qed.
Is there a way to use constructor names instead of their number? Or is the number more convenient for ltac scripts?
Yes, you can type eapply Rconcat
instead of econstructor 6
, or even design tactics for moving along the regular expression and the string.
I posted this definition because it shows an interesting way of using existential variables.
This technique is for instance illustrated in https://www.cs.unm.edu/~luger/ai-final2/CH8_Natural%20Language%20Processing%20in%20Prolog.pdf
If I remember well, this way of language processing was at the origin of the creation of the Prolog programming language by Alain Colmerauer.
Last updated: Oct 13 2024 at 01:02 UTC