Stream: Coq users

Topic: advancingIsAlwaysOk


view this post on Zulip Huỳnh Trần Khanh (Jan 21 2023 at 05:22):

My plan is to prove lots of properties about regular bracket strings! Too bad I'm just a self taught Coq user and I can't come up with the proof for advancingIsAlwaysOk. Here is the file: https://github.com/huynhtrankhanh/coqcp/blob/7c8095813f5ec0699edc4c0639f19dd6b5997775/theories/FillInTheBlanks.v

You can clone the repository to help me out, the only dependency is coq-stdpp.

Informal explanation of the lemma: you are given a string that consists of ?, ( and ) and you want to check if it's possible to replace all ? characters with ( and ) to make a regular bracket string. Turns out, there's a quick way to check this. First, notice that the number of ( and ) characters that you need to fill in is always fixed. The number of ( characters to be filled is n / 2 - filledOpen, where n is the length of the string and filledOpen is the number of ( characters already present in the string. Similarly, the number of ) characters to be filled is n / 2 - filledClose. So, to check whether it's possible to replace the ? marks to make a regular bracket string, just start by filling the first n / 2 - filledOpen question marks with (, and then fill the rest with ), and check whether the resulting string is balanced or not.

Now, according to the above pattern, the advance function checks whether the first question mark to be filled is a ( or a ), and then fills the corresponding symbol. The advancingIsAlwaysOk lemma states that for a string withBlanks, if it's possible to replace the question marks in withBlanks to form a regular bracket string, it's also possible to replace the question marks in advance withBlanks to form a regular bracket string. Then, with some silly induction on the number of blanks, I'll be able to prove that possibleToFill is equivalent to possibleToFillBool.

I hope the Coq community can assist me as I can't figure out how to prove this.

view this post on Zulip Li-yao (Jan 21 2023 at 13:36):

Do you need advance when you already have a solution on line 28?

view this post on Zulip Li-yao (Jan 21 2023 at 13:38):

You can avoid stripOptions by removing the option in the output of fillLeftToRight, and in the bad case just return the empty list because it's not going to be used anyway.

view this post on Zulip Huỳnh Trần Khanh (Jan 21 2023 at 14:31):

You are indeed right. I'll delete the ugly stripOptions by following your suggestion. This is a great idea. The overarching goal is to prove that possibleToFill is equivalent to possibleToFillBool, and I don't have a good proof strategy. I was like, making the advance function could decompose the problem into something simpler, but then again I don't have a concrete plan to attack the subproblem either.

view this post on Zulip Huỳnh Trần Khanh (Jan 21 2023 at 14:32):

The thing is, I don't even need a formal proof. I will code it when I have a good understanding of the problem. If anyone could give me a decent proof strategy, please tell me.

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 15:28):

FWIW: I'm sure by "regular bracket string" you mean a properly parenthesized string (I'm not correcting, just didn't know that jargon)? So for instance ?( has the right counts but can't be filled in, while (? ?) can be filled in as () () (which your strategy in possibleToFillBool won't find) or as (()) (which your strategy will find)... By "your strategy" I mean toFill := repeat BracketOpen remainingOpenCount ++ repeat BracketClose remainingCloseCount

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 15:35):

Things you probably already know, just to check: It should already be easy to prove that your boolean function is sound, as long as isBalancedBool decides isBalanced. The large challenge seems completeness, and on paper it seems that should follow from advancingIsAlwaysOk?

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 15:46):

Informally, it seems this might be total "because" a string is balanced as long as (1) has the same number of open and close parens (2) the "difference" never goes negative, and your strategy minimizes that risk? This makes intuitive sense (but I'm not yet sure how I'd turn that into an actual proof)

view this post on Zulip Karl Palmskog (Jan 21 2023 at 15:53):

so we are talking about a simple language with some form of balanced characters? Not sure I understand the methodology here, but I think the standard approach for handling this would be to describe it abstractly as a language, e.g., using grammars, then use a formalism like pushdown automata or plain parsers for efficient recognition. One example is having Menhir spit out a verified-sound Coq parser.

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 16:28):

Well the problem statement isn't just about parsing

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 16:35):

Or rather: there's a context-free language L1 with balanced characters, but we're interested in the language L2 of strings that can be _mutated_ into strings in L1

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 16:36):

(for a specific kind of mutation: replacing each question mark by a character).

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 16:40):

Okay you have a point, I conjecture L2 is context-free:

openBracket ::= "(" | "?"
closeBracket ::= ")" | "?"
normalChar ::= any character that is not "(", ")" or "?"
expr ::= normalChar* | openBracket expr closeBracket

(EDIT: defined normalChar, formatting)

view this post on Zulip Pierre Castéran (Jan 21 2023 at 16:40):

I think the proposed exercise is not about recognizing strings, but about solving some kind of equations.
For simplicity's sake, lets call a[resp. b] the open [resp. close] parenthesis.
For instance, the equation "aXYZTb is well balanced" has (at least) two solutions.

This Dyck language gives us a lot of nice exercises despite its simplicity. The exercise proposed by @Huỳnh Trần Khanh is one of them. It may be an example of Elpi query, or be represented as a non-trivial inductive predicate.

https://gist.github.com/Casteran/06343a071d0a874d9200d0fe79939870
https://gist.github.com/Casteran/57b528dcef1e0c052822da04bd5cba4a

view this post on Zulip Li-yao (Jan 21 2023 at 17:50):

@Huỳnh Trần Khanh The interesting direction of the theorem is that if there is a solution, the candidate you chose is also a solution. A useful property to note is that the Dyck path of your solution is above the Dyck path of all other solutions.

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 17:52):

"Dyck path"? Ah, Googling seems enough, e.g. https://mathworld.wolfram.com/DyckPath.html.

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 17:58):


Here's a proof sketch for L2 correctness. Unlike the ideas above, this should be formalizable directly and be entirely constructive — assuming a theory of CFGs and recognizers for them.

Lemma: L2 and possibleToFill are the same language. Let me write s[toFill] for the result of filling blanks in s using toFill (:= stripOptions (fillLeftToRight withBlanks toFill)). Also, parse L s T mean that T is a parse tree that shows string s is in language L.

(a) s ∈ L2 -> s ∈ possibleToFill: find a parse tree T such that parse L2 s T. To find toFill such that s' := s[toFill] ∈ L1, each ? must be replaced by ( or ) depending on how it was parsed, then construct a parse tree T' such that parse L1 s' T' by induction on T.
(b) s ∈ possibleToFill -> s ∈ L2: let s' := s[toFill], and T' a parse tree such that parse L1 s' T; you can construct a parse tree T such that parse L2 s T by induction on T' and analysis of toFill: each ? must parse as openBracket or closeBracket` depending on how it's filled.

FWIW: I think @Huỳnh Trần Khanh 's recognizer _might_ be a bit faster than a standard CFG parser for my L2 grammar, since it needs less backtracking, but that difference might be small, and not relevant depending on the overall goals.

view this post on Zulip Li-yao (Jan 21 2023 at 18:10):

That's the easy part (to the extent that it is about proving an equivalence between a grammar and a (nondeterministic) parser). The hard part is proving that language (whether written as L2 or possibleToFill) equivalent to possibleToFillBool, which is more or less about making the grammar non-ambiguous.

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 18:22):

right, I am changing the question (along Karl's suggestion): does @Huỳnh Trần Khanh need (a) any correct recognizer (b) or specifically possibleToFillBool? We seem to agree that (b) is trickier.

view this post on Zulip Huỳnh Trần Khanh (Jan 21 2023 at 18:27):

Specifically possibleToFillBool. And if this is too hard (but this shouldn't be, there's already a plausible enough approach—my path is above all other paths) I need a recognizer that works in linear time. I don't accept exponential time stuff.

view this post on Zulip Huỳnh Trần Khanh (Jan 21 2023 at 18:29):

My plan is to use an exchange argument, and I'll completely abandon the advancingIsAlwaysOk approach. In particular, I'll take two arbitrary question marks, such that the left one is filled with ) and the right one is filled with (, and then swap the symbols. Then I'll prove that after swapping the string is still balanced. I'll do induction on the number of inversions (?)

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 19:01):

I guess the key step could work by functional induction on execution of isBalancedBool...

view this post on Zulip Paolo Giarrusso (Jan 21 2023 at 19:04):

General CFGs wouldn't be exponential, but the general upper bound is indeed superlinear (cubic) and more work to implement


Last updated: Apr 20 2024 at 15:01 UTC