Is anyone aware of good formal model for hahsmap (paper or code .. coq or anything else)

The thing I really care about in the hashmap formalization is that it should have canonical form.

Google/scholar couldn't help me find anything so if anyon knows, please let me know.

I assume what you need is canonical representations — two maps `m1`

, `m2`

with the same entries are Leibniz equal (`m1 = m2`

)?

But do you need formal models for Coq hashmaps or a hashmaps in an imperative language? I'm not the expert but a few comments:

- Why do you need a hashmap and not just a dictionary — why is stdpp's
`gmap`

not adequate? That's the "functional model" I'd typically use in general in separation logic. It uses a non-lossy encoding of elements into naturals, instead of a lossy hash... - If you want a Coq hashmap, it's going to be inefficient without mutable arrays :-|. Coq's new primitive arrays _might_ alleviate that, but functional languages often use structures closer to radix trees like gmap (https://infoscience.epfl.ch/record/213452).
- If you solve the other challenges, how should collisions be resolved to ensure canonicity? Would you sort elements by "hash"?

the Ynot project did a hashtable based on Hoare Type Teory for Coq 8.3: https://github.com/ynot-harvard/ynot/tree/master/examples/Data

However, I'm pretty sure most of the authors would call both the approach and code obsolete now. There is a new maintained version of Hoare Type Theory though, but without the hashtable: https://github.com/imdea-software/htt

To answer all your questions:

1- first yes leibniz equality is what I am looking for, why stdpp implementation is not fine?, I reached the conclusion that I might want to write proofs in the style of small step reflection to make them easily maintainable (by me) during all my experiments. But stdpp devs advised against combining both stdpp with ssreflect. So I think I will stop using stdpp and just use ssreflect and hope that my life gets easier once I re-implement all the stuff I need.

2, 3- that was why I was asking, I couldn't come up with trivial canonical form for collisions.

I am not sure if that is wise decision (both in effort required for short term, and then effort required for long term).

Ofcourse you input will be appreciated.

for ssreflect, you have finmap, which contains finite maps that are Leibniz equal when they have the same mappings: https://github.com/math-comp/finmap

I am now excited! so I feel the urge to ask does it come also come with notion of finite sets or I need to create my own ?

( I know I can check it but I am too excited to be patient)

there are multiple notions of what a finite set could mean. You have "finite set" in the sense of "a collection of elements inhabiting a finite type" already in basic MathComp. Then in finmap there are sets with finite number of elements drawn from a choiceType I believe (i.e., elements drawn from a type which has infinitely many inhabitants)

I am not sure what the difference is, but from programming perspective, the sets I was imagining was map with key drawn from type T and value unit/true. :sweat_smile:

a finite type can be formed out of the list `[1;2;3;4]`

. Then a finset from MathComp is (decidable) membership predicate on that list.

On the other hand, you can have finmap finitely-supported sets even on `nat`

, as long as each set has a finite number of members.

I think I need the first one, thanks for all the pointers.

https://math-comp.github.io/htmldoc_1_15_0/mathcomp.ssreflect.finset.html

wow! real documentation!

thanks a lot!

finset is not a map, however. if you need the map aspect: https://github.com/math-comp/finmap/blob/master/finmap.v (`{fmap K -> V} `

)

I will need to learn more about the differences before knowing what I really need.

But I need both a collection that one can get from array and k/v pairs.

Small clarification to your rationale: combining stdpp with the ssreflect tactics is fine (iris does it). The warning holds true for anything more advanced :-)

(but of course the math-comp ecosystem is a very fine one, if you don't need Iris)

It is not clear if I can get away without iris, given that I am interested in side effects:-

worst case I will have to rewrite the proofs for third time, which is not good, but not that bad, and I feel I really liked mathcomp ecosystem. .. at least the verbosity of the proofs, and ofc the documentation.

but it is a good thing that I don't forget that I still need to learn iris, and I am patiently waiting for iris workshop next year.

You might be interested otherwise on the canonical tries of Leroy and Appel (Google for the paper). They have perf benchmarks about them. They’re just maps though not hashmaps

https://github.com/xavierleroy/canonical-binary-tries

incidentally, we'd like to see this collection of finite maps become maintained....

It certainly looks nice, and being AVL based is impressive, but my question here is, is there an advantage for using this implementation over mathcomp/finmap ?

I was wondering how they abstract on the key type, but they don't, so I guess one would need to add that separately.

Apart from that, https://hal.inria.fr/hal-03372247 describes the advantages. IIRC, what's nice is that they are canonical _and_ they do that without embedding canonicity proofs

that relates to

Efficiency not only as extracted to OCaml (i.e., with proofs erased), but also as represented in Coq and computed within Coq (i.e., without proof erasure).

the basic rule of thumb for choosing between mathcomp-finmap and triemap or similar: mathcomp-finmap is designed for proving, not computation inside Coq (I think most computation is locked), so unless you need computation, finmap may be the way to go

stdpp's gmap is efficient after extraction, and acceptable for computation in Coq with `vm_compute`

, but you don't want to print normalized maps because of the embedded proofs; that's exactly what the new paper should fix

https://gitlab.mpi-sws.org/iris/stdpp/-/issues/64 discusses the issues...

annoyingly, I'm not sure the new paper would remove _all_ proofs (at least, based on some skimming): it could probably replace the underlying storage (`stdpp.pmap`

), but `gmap`

needs another layer of proofs:

```
(** We pack a [Pmap] together with a proof that ensures that all keys correspond
to codes of actual elements of the countable type. *)
Definition gmap_wf K `{Countable K} {A} (m : Pmap A) : Prop :=
bool_decide (map_Forall (λ p _, encode (A:=K) <$> decode p = Some p) m).
Record gmap K `{Countable K} A := GMap {
gmap_car : Pmap A;
gmap_prf : gmap_wf K gmap_car
}.
```

We found out in certicoq though that canonical tries can still be a bottleneck because of the conversion to positive. Having (hash) maps with bignum keys would certainly beat them

Karl Palmskog said:

the Ynot project did a hashtable based on Hoare Type Teory for Coq 8.3: https://github.com/ynot-harvard/ynot/tree/master/examples/Data

However, I'm pretty sure most of the authors would call both the approach and code obsolete now. There is a new maintained version of Hoare Type Theory though, but without the hashtable: https://github.com/imdea-software/htt

Umm, we do have a hash table example in current HTT: https://github.com/imdea-software/htt/blob/master/examples/hashtab.v

Last updated: Dec 07 2023 at 17:01 UTC