Stream: Ltac2

Topic: Hash table


view this post on Zulip Michael Soegtrop (Nov 10 2021 at 16:34):

@Pierre-Marie Pédrot @Jason Gross (and all): are you aware of on Ltac2 implementation of an OCaml style hash table?

Also I just wrote a hash function for constr / kind. There are a few term types one can't look into with Ltac2 code, but I would say the hash function should have few collisions for most purposes anyway. It might just be a bit slow, because for constant, inductive and constructorI go over a reference and Env.path and hash the resulting array of idents as strings - not sure if there is a better way. A OCaml hash function for strings would probably speed up things a bit.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 17:01):

Not that I know of, but we could export the OCaml implementation straightforwardly.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 17:01):

Note that storing terms inside a hashtable for tactic purposes is a very dangerous area.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 17:02):

Terms are indeed not only given by the AST, but are actually a quotient of a term + the evarmap. Notably we don't want to observe the differences between an evar and its expansion, and e.g. universes are only considered up to unification.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 17:03):

So if you only consider the AST for hashing, you'll run into horribly subtle issues where two terms that are otherwise completely the same are not mapped to the same value.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 17:04):

You can always normalize upfront but then the hashtable will be tied to a specific evarmap, and any extension of the evarmap will invalidate the hashtable. Typically, defining an evar.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 17:04):

I might be overly pessimistic and a trivial hash table implementation may work for 99% of the uses, but I just wanted to be clear that this is more subtle than it looks like.

view this post on Zulip Michael Soegtrop (Nov 10 2021 at 18:02):

@Pierre-Marie Pédrot : I think the fields of the "kind" structure which would be dangerous are anyway not inspectable by Ltac2, notably the evar field of the Evar constructor. I guess the "instance" fields of various kind constructors could also contribute to strange effects but are also opaque at Ltac2 level. I got the impression that the stuff which can be inspected with Ltac2 isn't dangerous. But I anyway intend to use it for closed terms only and if I miss something it also won't kill me - I use this to check if I did prove something already.

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 18:10):

notably the evar field of the Evar constructor.

Yes they are. A term in Ltac2 is like an AST with mutable nodes. Evar nodes behave like some kind of reference that can be set imperatively by unification, so if you intend to store a term with evars it might be the case that the value of the hash changes by side-effect if some unlucky unification is performed.

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 18:12):

how does mutation in ltac2 interact wit backtrack?
if you instantiate an evar with 0, put in hash table, backtrack, instantiate with 1, what's in the table?

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 18:21):

depends on the implementation of the hash table, but if you store the term it didn't change

view this post on Zulip Pierre-Marie Pédrot (Nov 10 2021 at 18:21):

physically, I mean, but obviously observing the head of term is dereferencing evars

view this post on Zulip Michael Soegtrop (Nov 10 2021 at 19:01):

What is unclear to me: in the kind constructor Evar (evar, constr array) which part changes by unification - the evar part - which is an opaque type in Ltac2 so it doesn't influence my hash - or the constr array part - which I currently have in the hash?

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 19:04):

the result of the kind function changes

view this post on Zulip Michael Soegtrop (Nov 10 2021 at 19:07):

This is clear, but which part of the Evar constructor changes?

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 19:07):

none

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 19:07):

if you do

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 19:08):

let k := Evar (some stuff) in
let c := Constr.make k in
unify c c';
Constr.kind c

it won't return Evar (some stuff), but k is still bound to it

view this post on Zulip Michael Soegtrop (Nov 10 2021 at 19:15):

Hmm I still don't understand what you want to tell me. My assumption is that in the data structure returned by "kind" nothing changes by unification except the two arguments to an Evar constructor. Is this assumption correct?

If so, which of the two arguments of the Evar constructor change by unification? The first (of type evar), the second (of type constr array) or both?

I am not talking about terms constructed from a "kind" structure, but about repeatedly inspecting sub terms of the goal or of hypothesis.

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 19:19):

subterms of the goal are the same as terms from make
if kind returns Evar on a term, future calls on the same term may return literally anything depending on unification

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 19:26):

should keys be restricted to be evar-free?

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 19:30):

above @Michael Soegtrop talks about “closed” terms, if that is meant to exclude evars you’re fine.

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 19:31):

for a reusable hashmap, it still seems safer if the map rejected keys with evars — they’re morally mutable, and mutable dictionary keys indeed don’t have nice semantics.

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 19:34):

correct me if I’m wrong, but at a high level, since an evar can become syntactically equal to pretty arbitrary terms (of the right type), and those terms can have different hashes (say, h1 and h2), then the hash of that evar must be equal to both h1 and h2.

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 19:35):

which is impossible – hence a safe hash function must abort outright when encountering an evar. Which seems a cheap enough way to prevent evars in hash keys without an extra traversal?

view this post on Zulip Michael Soegtrop (Nov 10 2021 at 20:05):

One can of course also just ignore the Evar part and hash only the rest of the term, so that the hash is "modulo evars".

view this post on Zulip Gaëtan Gilbert (Nov 10 2021 at 20:13):

no, because when the evar gets instantiated the new hash won't know what to ignore

view this post on Zulip Enrico Tassi (Nov 10 2021 at 20:27):

This is food for the ATP community. Hashes are hard because they are not invariant by substitution (evar instantiation). You can build an order relation which is stable by substitution, typically a polynomial over the holes, but that is a bit expensive. IMO here the best option is to abort if the key has holes.

view this post on Zulip Enrico Tassi (Nov 10 2021 at 20:34):

If you really want to index with a hash, you can use the bloom filter trick: https://github.com/LPCIC/elpi/blob/master/ELPI.md#configurable-argument-indexing but the lookup is going to be an over approximation. (unfortunately the original source is down, so I cite myself, see the part about unification hashes)

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 20:38):

@Enrico Tassi is the source http://blog.ndrix.com/2013/03/prolog-unification-hashes.html ? The link has a typo

view this post on Zulip Enrico Tassi (Nov 10 2021 at 20:38):

shame on me

view this post on Zulip Paolo Giarrusso (Nov 10 2021 at 20:40):

https://github.com/LPCIC/elpi/pull/121

view this post on Zulip Michael Soegtrop (Nov 11 2021 at 08:06):

Gaëtan Gilbert said:

no, because when the evar gets instantiated the new hash won't know what to ignore

My assumption was that when the evar gets instantiated, the reference to the evar remains - it just has a known value then. From your description I derive that this is not the case - if the evar gets instantiated, the Evar reference is replaced with its value. Is this correct?

But anyway, as I said I don't have terms with evars, so I can throw an exception if I come across one. Also this is just about optimization - worst case I prove the same thing twice (I want to use the hash to remember terms for which I already proved certain properties). As long as this doesn't happen frequently, it is not much of an issue.

view this post on Zulip Michael Soegtrop (Nov 11 2021 at 08:24):

@Enrico Tassi @Paolo Giarrusso : thanks for the pointers - interesting! But I don't think it is worth the effort in my case. I want to have a reasonably efficient dictionary from terms to proofs of certain properties of these terms. First it is unlikely that I can proof something about terms with evars, so that it is unlikely (but possible) that they will end up in the hash table. If they end up in the hash table, it is not likely that the evar states change while the hash table is in use. And even if they would, the worst thing which could happen is that I have to do the proof again. But I do rely on that I can reliably find hypothesis this way - here I have to ensure that the hypothesis are free of evars.

view this post on Zulip Enrico Tassi (Nov 11 2021 at 09:38):

Michael Soegtrop said:

if the evar gets instantiated, the Evar reference is replaced with its value. Is this correct?

Yes, this is the abstraction/quotient that EConstr.t gives you (and Ltac2 is based on that, as most of the Coq base tactics are or are being ported to).

If your keys are ground, then I think you are good. Said that, if computing the hash means traversing the whole term, then the hash table is not going to be much better that any trie/discrimination tree data structure.

view this post on Zulip Michael Soegtrop (Nov 11 2021 at 11:58):

@Enrico Tassi : indeed the performance of the hash table might not be good - tests will show. I thought about discrimination trees, but it would be a lot of code, because the Coq term structure is quite complicated. Btw.: is this somehow built in in Elpi? Say could I programmatically store and query rules from Ltac2 or Ltac as in usual Prolog?


Last updated: Oct 08 2024 at 14:01 UTC