Stream: Miscellaneous

Topic: Hashmap formalization


view this post on Zulip walker (Oct 15 2022 at 11:41):

Is anyone aware of good formal model for hahsmap (paper or code .. coq or anything else)

The thing I really care about in the hashmap formalization is that it should have canonical form.

Google/scholar couldn't help me find anything so if anyon knows, please let me know.

view this post on Zulip Paolo Giarrusso (Oct 15 2022 at 12:36):

I assume what you need is canonical representations — two maps m1, m2 with the same entries are Leibniz equal (m1 = m2)?
But do you need formal models for Coq hashmaps or a hashmaps in an imperative language? I'm not the expert but a few comments:

  1. Why do you need a hashmap and not just a dictionary — why is stdpp's gmap not adequate? That's the "functional model" I'd typically use in general in separation logic. It uses a non-lossy encoding of elements into naturals, instead of a lossy hash...
  2. If you want a Coq hashmap, it's going to be inefficient without mutable arrays :-|. Coq's new primitive arrays _might_ alleviate that, but functional languages often use structures closer to radix trees like gmap (https://infoscience.epfl.ch/record/213452).
  3. If you solve the other challenges, how should collisions be resolved to ensure canonicity? Would you sort elements by "hash"?

view this post on Zulip Karl Palmskog (Oct 15 2022 at 12:50):

the Ynot project did a hashtable based on Hoare Type Teory for Coq 8.3: https://github.com/ynot-harvard/ynot/tree/master/examples/Data

However, I'm pretty sure most of the authors would call both the approach and code obsolete now. There is a new maintained version of Hoare Type Theory though, but without the hashtable: https://github.com/imdea-software/htt

view this post on Zulip walker (Oct 15 2022 at 13:40):

To answer all your questions:

1- first yes leibniz equality is what I am looking for, why stdpp implementation is not fine?, I reached the conclusion that I might want to write proofs in the style of small step reflection to make them easily maintainable (by me) during all my experiments. But stdpp devs advised against combining both stdpp with ssreflect. So I think I will stop using stdpp and just use ssreflect and hope that my life gets easier once I re-implement all the stuff I need.

2, 3- that was why I was asking, I couldn't come up with trivial canonical form for collisions.

view this post on Zulip walker (Oct 15 2022 at 13:40):

I am not sure if that is wise decision (both in effort required for short term, and then effort required for long term).
Ofcourse you input will be appreciated.

view this post on Zulip Karl Palmskog (Oct 15 2022 at 13:41):

for ssreflect, you have finmap, which contains finite maps that are Leibniz equal when they have the same mappings: https://github.com/math-comp/finmap

view this post on Zulip walker (Oct 15 2022 at 13:45):

I am now excited! so I feel the urge to ask does it come also come with notion of finite sets or I need to create my own ?
( I know I can check it but I am too excited to be patient)

view this post on Zulip Karl Palmskog (Oct 15 2022 at 13:47):

there are multiple notions of what a finite set could mean. You have "finite set" in the sense of "a collection of elements inhabiting a finite type" already in basic MathComp. Then in finmap there are sets with finite number of elements drawn from a choiceType I believe (i.e., elements drawn from a type which has infinitely many inhabitants)

view this post on Zulip walker (Oct 15 2022 at 13:50):

I am not sure what the difference is, but from programming perspective, the sets I was imagining was map with key drawn from type T and value unit/true. :sweat_smile:

view this post on Zulip Karl Palmskog (Oct 15 2022 at 13:52):

a finite type can be formed out of the list [1;2;3;4]. Then a finset from MathComp is (decidable) membership predicate on that list.

On the other hand, you can have finmap finitely-supported sets even on nat, as long as each set has a finite number of members.

view this post on Zulip walker (Oct 15 2022 at 13:54):

I think I need the first one, thanks for all the pointers.

view this post on Zulip Karl Palmskog (Oct 15 2022 at 13:54):

https://math-comp.github.io/htmldoc_1_15_0/mathcomp.ssreflect.finset.html

view this post on Zulip walker (Oct 15 2022 at 13:55):

wow! real documentation!

view this post on Zulip walker (Oct 15 2022 at 13:56):

thanks a lot!

view this post on Zulip Karl Palmskog (Oct 15 2022 at 13:56):

finset is not a map, however. if you need the map aspect: https://github.com/math-comp/finmap/blob/master/finmap.v ({fmap K -> V} )

view this post on Zulip walker (Oct 15 2022 at 13:57):

I will need to learn more about the differences before knowing what I really need.

view this post on Zulip walker (Oct 15 2022 at 13:57):

But I need both a collection that one can get from array and k/v pairs.

view this post on Zulip Paolo Giarrusso (Oct 15 2022 at 18:57):

Small clarification to your rationale: combining stdpp with the ssreflect tactics is fine (iris does it). The warning holds true for anything more advanced :-)

view this post on Zulip Paolo Giarrusso (Oct 15 2022 at 19:00):

(but of course the math-comp ecosystem is a very fine one, if you don't need Iris)

view this post on Zulip walker (Oct 15 2022 at 20:20):

It is not clear if I can get away without iris, given that I am interested in side effects:-

worst case I will have to rewrite the proofs for third time, which is not good, but not that bad, and I feel I really liked mathcomp ecosystem. .. at least the verbosity of the proofs, and ofc the documentation.

view this post on Zulip walker (Oct 15 2022 at 20:21):

but it is a good thing that I don't forget that I still need to learn iris, and I am patiently waiting for iris workshop next year.

view this post on Zulip Matthieu Sozeau (Oct 16 2022 at 07:59):

You might be interested otherwise on the canonical tries of Leroy and Appel (Google for the paper). They have perf benchmarks about them. They’re just maps though not hashmaps

view this post on Zulip Karl Palmskog (Oct 16 2022 at 10:15):

https://github.com/xavierleroy/canonical-binary-tries

view this post on Zulip Karl Palmskog (Oct 16 2022 at 10:16):

incidentally, we'd like to see this collection of finite maps become maintained....

view this post on Zulip walker (Oct 16 2022 at 12:38):

It certainly looks nice, and being AVL based is impressive, but my question here is, is there an advantage for using this implementation over mathcomp/finmap ?

view this post on Zulip Paolo Giarrusso (Oct 16 2022 at 14:00):

I was wondering how they abstract on the key type, but they don't, so I guess one would need to add that separately.

view this post on Zulip Paolo Giarrusso (Oct 16 2022 at 14:02):

Apart from that, https://hal.inria.fr/hal-03372247 describes the advantages. IIRC, what's nice is that they are canonical _and_ they do that without embedding canonicity proofs

view this post on Zulip Paolo Giarrusso (Oct 16 2022 at 14:03):

that relates to

Efficiency not only as extracted to OCaml (i.e., with proofs erased), but also as represented in Coq and computed within Coq (i.e., without proof erasure).

view this post on Zulip Karl Palmskog (Oct 16 2022 at 14:05):

the basic rule of thumb for choosing between mathcomp-finmap and triemap or similar: mathcomp-finmap is designed for proving, not computation inside Coq (I think most computation is locked), so unless you need computation, finmap may be the way to go

view this post on Zulip Paolo Giarrusso (Oct 16 2022 at 14:17):

stdpp's gmap is efficient after extraction, and acceptable for computation in Coq with vm_compute, but you don't want to print normalized maps because of the embedded proofs; that's exactly what the new paper should fix

view this post on Zulip Paolo Giarrusso (Oct 16 2022 at 14:21):

https://gitlab.mpi-sws.org/iris/stdpp/-/issues/64 discusses the issues...

view this post on Zulip Paolo Giarrusso (Oct 16 2022 at 14:23):

annoyingly, I'm not sure the new paper would remove _all_ proofs (at least, based on some skimming): it could probably replace the underlying storage (stdpp.pmap), but gmap needs another layer of proofs:

(** We pack a [Pmap] together with a proof that ensures that all keys correspond
to codes of actual elements of the countable type. *)
Definition gmap_wf K `{Countable K} {A} (m : Pmap A) : Prop :=
  bool_decide (map_Forall  p _, encode (A:=K) <$> decode p = Some p) m).
Record gmap K `{Countable K} A := GMap {
  gmap_car : Pmap A;
  gmap_prf : gmap_wf K gmap_car
}.

view this post on Zulip Matthieu Sozeau (Oct 17 2022 at 12:38):

We found out in certicoq though that canonical tries can still be a bottleneck because of the conversion to positive. Having (hash) maps with bignum keys would certainly beat them

view this post on Zulip Alexander Gryzlov (Oct 21 2022 at 16:43):

Karl Palmskog said:

the Ynot project did a hashtable based on Hoare Type Teory for Coq 8.3: https://github.com/ynot-harvard/ynot/tree/master/examples/Data

However, I'm pretty sure most of the authors would call both the approach and code obsolete now. There is a new maintained version of Hoare Type Theory though, but without the hashtable: https://github.com/imdea-software/htt

Umm, we do have a hash table example in current HTT: https://github.com/imdea-software/htt/blob/master/examples/hashtab.v


Last updated: Oct 13 2024 at 01:02 UTC