## Stream: Miscellaneous

### Topic: Sorting subarrays

#### Huỳnh Trần Khanh (Feb 22 2023 at 13:29):

Thanks to the assistance of the Coq community, I've proved three properties about regular bracket strings. I really appreciate your input. Without your input, I couldn't have gotten this far.

But now I have to solve another problem! A world class competitive programmer challenged me and other folks to solve a problem about sorting subarrays. The solution must be fast enough to run in 1 second on modern machines.

While this problem isn't too related to Coq yet, I believe the Coq community has the right skills to assist me with the problem. Can you give me some pointers? And can I use Coq's program extraction facilities to extract fast enough code for this problem? Remember we have to work within the 1 second time limit.

#### Paolo Giarrusso (Feb 22 2023 at 17:28):

Extracted solutions will probably be purely functional, which might affect a bit constant factors and space usage. At a glance I guess a plain qsort wouldn't be appropriate here, but for some purposes the standard purely functional implementation might not be acceptable (wrong space complexity, and worse constant factors)

#### Paolo Giarrusso (Feb 22 2023 at 17:30):

For a Fermi estimation, could you implement a quick sort in Coq on N/Z and run it with vm_compute on the whole array, and see how off it is from what you'd need?

#### Paolo Giarrusso (Feb 22 2023 at 17:32):

(but I'm optimizing for "skip extraction at first", not sure it's that bad)

#### Gaëtan Gilbert (Feb 22 2023 at 17:35):

maybe use the parray stuff?

#### Huỳnh Trần Khanh (Feb 23 2023 at 01:35):

i am looking for something that runs in O((n+q)log^2 n) because the world class CPer claims this solution is easy to come up with. any ideas? you can give me an imperative algorithm even, I'll use my brain to convert it to a reasonable functional algorithm. after all when you have an imperative algorithm we just need to apply standard tricks to make it functional, and since the target time complexity has the log^2 n part chances are there is a binary tree in the solution, which functional programming can handle easily

#### Huỳnh Trần Khanh (Feb 23 2023 at 01:36):

and according to the world class CPer, having to output a single element is a very important clue that we gotta take advantage of

#### Paolo Giarrusso (Feb 23 2023 at 01:57):

(IIRC functional conversion might add a log N factor, but that's probably acceptable)
My immediate reaction to that hint is to think about quick select instead of quicksort (modulo usual patches for good worst-case performance)

#### Paolo Giarrusso (Feb 23 2023 at 01:58):

But I don't see how to just use quickselect on any query beyond the last...

#### Paolo Giarrusso (Feb 23 2023 at 02:04):

Ah: a strategy could be, for each query, to avoid a complete sorting, and only select the values that are going to be relevant to later queries. "Relevant" isn't trivial to define, but maybe it's a matter of some "simple" computational geometry: interval trees aren't that famous, but they're BSTs adapted to store (non-overlapping) ranges

#### Paolo Giarrusso (Feb 23 2023 at 02:05):

And they let you query whether a new interval overlaps with any interval in the tree in logarithmic time

#### Paolo Giarrusso (Feb 23 2023 at 02:08):

But I'm not claiming a solution, since I haven't actually done the work :grinning:

#### Huỳnh Trần Khanh (Feb 23 2023 at 02:48):

thanks, i guess we can use binary search to find the value we need to output. if we guess a value x, we can mark all elements not less than x as 1 and everything else as 0. your idea is very good here, and combined with the binary search idea, the tree can contain intervals where all numbers are 1

after processing every query, we can update the interval tree to reflect the binary array. but updating may be hard, do you have ideas? the final step is easy though, we just check whether there is an interval that contains index K. maybe the update step is very obvious but i am too much of a brick to come up with it

#### Paolo Giarrusso (Feb 23 2023 at 05:27):

Maybe: Loop over queries backwards, propagating "needs": the final result is needed, and every time your iterate over a query that overlaps with needs (as computed via interval trees), mark its inputs as needed, and extend a "program" with selection queries that compute the needed outputs. Non-overlapping queries can be skipped (but only because we go backwards!). Finally, reverse and actually run this "program".

#### Guillaume Melquiond (Feb 23 2023 at 06:01):

That will not give you the expected complexity. Consider a sequence of query (0,n/2), (n/2,n), (0,n/2), (n/2,n), etc (i.e., one single element in common each time), so all the queries are relevant. Even if you use a radix+counting sort, you will still end up with a total complexity of O(qn). (And note that sorting the whole array at once will not give you the correct final result, unless q > n.)

#### Huỳnh Trần Khanh (Feb 23 2023 at 07:08):

you're not really getting my idea I guess. I want to turn the array of arbitrary integers problem into an array of 0s and 1s problem. a way to do this is by doing binary search on the answer, and to do this, you need to write a check function to check whether the kth element after performing all the queries is not less than some target

#### Huỳnh Trần Khanh (Feb 23 2023 at 07:09):

and in the previous messages I explained an attempt to write the check function. our job is to find the greatest value such that the check function returns true

#### Huỳnh Trần Khanh (Feb 23 2023 at 07:11):

you see when we frame the problem this way your interval tree idea suddenly becomes quite viable

#### Huỳnh Trần Khanh (Feb 23 2023 at 07:17):

your original intention was to use the interval tree to intelligently determine the ranges to sort. this is not a sound idea. however, when we turn the problem into a problem about binary arrays, the idea becomes viable as we can use the interval tree to maintain intervals where all elements equal 1

#### Huỳnh Trần Khanh (Feb 23 2023 at 07:19):

the only hard part is to handle the sort subarray queries for the very specific case of binary arrays, which I believe is easier but still, I have no idea yet

#### Guillaume Melquiond (Feb 23 2023 at 07:38):

If your input only contains 0s and 1s, at any time, your whole array is the concatenation of runs of 0s, runs of 1s, or runs of unsorted initial values. When you apply a new query, you only have to count the number of 0s and 1s in the overlapped runs. This is constant time for runs of 0s and 1s, and linear time for runs of initial values (but they will no longer exist after the query, so that is fine).

Last updated: Aug 03 2024 at 22:01 UTC