My plan is to prove lots of properties about regular bracket strings! Too bad I'm just a self taught Coq user and I can't come up with the proof for advancingIsAlwaysOk. Here is the file: https://github.com/huynhtrankhanh/coqcp/blob/7c8095813f5ec0699edc4c0639f19dd6b5997775/theories/FillInTheBlanks.v

You can clone the repository to help me out, the only dependency is coq-stdpp.

Informal explanation of the lemma: you are given a string that consists of `?`

, `(`

and `)`

and you want to check if it's possible to replace all `?`

characters with `(`

and `)`

to make a regular bracket string. Turns out, there's a quick way to check this. First, notice that the number of `(`

and `)`

characters that you need to fill in is always fixed. The number of `(`

characters to be filled is `n / 2 - filledOpen`

, where n is the length of the string and filledOpen is the number of `(`

characters already present in the string. Similarly, the number of `)`

characters to be filled is `n / 2 - filledClose`

. So, to check whether it's possible to replace the `?`

marks to make a regular bracket string, just start by filling the first `n / 2 - filledOpen`

question marks with `(`

, and then fill the rest with `)`

, and check whether the resulting string is balanced or not.

Now, according to the above pattern, the `advance`

function checks whether the first question mark to be filled is a `(`

or a `)`

, and then fills the corresponding symbol. The advancingIsAlwaysOk lemma states that for a string withBlanks, if it's possible to replace the question marks in withBlanks to form a regular bracket string, it's also possible to replace the question marks in `advance withBlanks`

to form a regular bracket string. Then, with some silly induction on the number of blanks, I'll be able to prove that possibleToFill is equivalent to possibleToFillBool.

I hope the Coq community can assist me as I can't figure out how to prove this.

Do you need advance when you already have a solution on line 28?

You can avoid `stripOptions`

by removing the option in the output of `fillLeftToRight`

, and in the bad case just return the empty list because it's not going to be used anyway.

You are indeed right. I'll delete the ugly stripOptions by following your suggestion. This is a great idea. The overarching goal is to prove that possibleToFill is equivalent to possibleToFillBool, and I don't have a good proof strategy. I was like, making the `advance`

function could decompose the problem into something simpler, but then again I don't have a concrete plan to attack the subproblem either.

The thing is, I don't even need a formal proof. I will code it when I have a good understanding of the problem. If anyone could give me a decent proof strategy, please tell me.

FWIW: I'm sure by "regular bracket string" you mean a properly parenthesized string (I'm not correcting, just didn't know that jargon)? So for instance `?(`

has the right counts but can't be filled in, while `(? ?)`

can be filled in as `() ()`

(which your strategy in possibleToFillBool won't find) or as `(())`

(which your strategy will find)... By "your strategy" I mean `toFill := repeat BracketOpen remainingOpenCount ++ repeat BracketClose remainingCloseCount`

Things you probably already know, just to check: It should already be easy to prove that your boolean function is *sound*, as long as `isBalancedBool`

decides `isBalanced`

. The large challenge seems completeness, and *on paper* it seems that should follow from `advancingIsAlwaysOk`

?

Informally, it seems this might be total "because" a string is balanced as long as (1) has the same number of open and close parens (2) the "difference" never goes negative, and your strategy minimizes that risk? This makes intuitive sense (but I'm not yet sure how I'd turn that into an actual proof)

so we are talking about a simple language with some form of balanced characters? Not sure I understand the methodology here, but I think the standard approach for handling this would be to describe it abstractly as a language, e.g., using grammars, then use a formalism like pushdown automata or plain parsers for efficient recognition. One example is having Menhir spit out a verified-sound Coq parser.

Well the problem statement isn't just about parsing

Or rather: there's a context-free language L1 with balanced characters, but we're interested in the language L2 of strings that can be _mutated_ into strings in L1

(for a specific kind of mutation: replacing each question mark by a character).

Okay you have a point, I conjecture L2 *is* context-free:

```
openBracket ::= "(" | "?"
closeBracket ::= ")" | "?"
normalChar ::= any character that is not "(", ")" or "?"
expr ::= normalChar* | openBracket expr closeBracket
```

(EDIT: defined `normalChar`

, formatting)

I think the proposed exercise is not about recognizing strings, but about solving some kind of equations.

For simplicity's sake, lets call `a`

[resp. `b`

] the open [resp. close] parenthesis.

For instance, the equation "aXYZTb is well balanced" has (at least) two solutions.

This Dyck language gives us a lot of nice exercises despite its simplicity. The exercise proposed by @Huỳnh Trần Khanh is one of them. It may be an example of Elpi query, or be represented as a non-trivial inductive predicate.

https://gist.github.com/Casteran/06343a071d0a874d9200d0fe79939870

https://gist.github.com/Casteran/57b528dcef1e0c052822da04bd5cba4a

@Huỳnh Trần Khanh The interesting direction of the theorem is that if there is a solution, the candidate you chose is also a solution. A useful property to note is that the Dyck path of your solution is above the Dyck path of all other solutions.

"Dyck path"? Ah, Googling seems enough, e.g. https://mathworld.wolfram.com/DyckPath.html.

Here's a proof sketch for L2 correctness. Unlike the ideas above, this should be formalizable directly and be entirely constructive — assuming a theory of CFGs and recognizers for them.

Lemma: L2 and `possibleToFill`

are the same language. Let me write `s[toFill]`

for the result of filling blanks in `s`

using `toFill`

(`:= stripOptions (fillLeftToRight withBlanks toFill)`

). Also, `parse L s T`

mean that `T`

is a parse tree that shows string `s`

is in language `L`

.

(a) `s ∈ L2 -> s ∈ possibleToFill`

: find a parse tree `T`

such that `parse L2 s T`

. To find `toFill `

such that `s' := s[toFill] ∈ L1`

, each `?`

must be replaced by `(`

or `)`

depending on how it was parsed, then construct a parse tree `T'`

such that `parse L1 s' T'`

by induction on `T`

.

(b) `s ∈ possibleToFill -> s ∈ L2`

: let `s' := s[toFill]`

, and `T' a parse tree such that `

parse L1 s' T`; you can construct a parse tree `

T` such that `

parse L2 s T` by induction on `

T'` and analysis of `

toFill`: each `

?` must parse as `

openBracket` or `

closeBracket` depending on how it's filled.

FWIW: I think @Huỳnh Trần Khanh 's recognizer _might_ be a bit faster than a standard CFG parser for my L2 grammar, since it needs less backtracking, but that difference might be small, and not relevant depending on the overall goals.

That's the easy part (to the extent that it is about proving an equivalence between a grammar and a (nondeterministic) parser). The hard part is proving that language (whether written as L2 or `possibleToFill`

) equivalent to `possibleToFillBool`

, which is more or less about making the grammar non-ambiguous.

right, I am changing the question (along Karl's suggestion): does @Huỳnh Trần Khanh need (a) any correct recognizer (b) or specifically `possibleToFillBool`

? We seem to agree that (b) is trickier.

Specifically possibleToFillBool. And if this is too hard (but this shouldn't be, there's already a plausible enough approach—my path is above all other paths) I need a recognizer that **works in linear time**. I don't accept exponential time stuff.

My plan is to use an exchange argument, and I'll completely abandon the advancingIsAlwaysOk approach. In particular, I'll take two arbitrary question marks, such that the left one is filled with `)`

and the right one is filled with `(`

, and then swap the symbols. Then I'll prove that after swapping the string is still balanced. I'll do induction on the number of inversions (?)

I guess the key step could work by functional induction on execution of isBalancedBool...

General CFGs wouldn't be exponential, but the general upper bound is indeed superlinear (cubic) and more work to implement

Last updated: Jun 18 2024 at 22:01 UTC