@Pierre-Marie Pédrot @Jason Gross (and all): are you aware of on Ltac2 implementation of an OCaml style hash table?

Also I just wrote a hash function for constr / kind. There are a few term types one can't look into with Ltac2 code, but I would say the hash function should have few collisions for most purposes anyway. It might just be a bit slow, because for constant, inductive and constructorI go over a reference and Env.path and hash the resulting array of idents as strings - not sure if there is a better way. A OCaml hash function for strings would probably speed up things a bit.

Not that I know of, but we could export the OCaml implementation straightforwardly.

Note that storing terms inside a hashtable for tactic purposes is a very dangerous area.

Terms are indeed not only given by the AST, but are actually a quotient of a term + the evarmap. Notably we don't want to observe the differences between an evar and its expansion, and e.g. universes are only considered up to unification.

So if you only consider the AST for hashing, you'll run into horribly subtle issues where two terms that are otherwise completely the same are not mapped to the same value.

You can always normalize upfront but then the hashtable will be tied to a specific evarmap, and any extension of the evarmap will invalidate the hashtable. Typically, defining an evar.

I might be overly pessimistic and a trivial hash table implementation may work for 99% of the uses, but I just wanted to be clear that this is more subtle than it looks like.

@Pierre-Marie Pédrot : I think the fields of the "kind" structure which would be dangerous are anyway not inspectable by Ltac2, notably the evar field of the Evar constructor. I guess the "instance" fields of various kind constructors could also contribute to strange effects but are also opaque at Ltac2 level. I got the impression that the stuff which can be inspected with Ltac2 isn't dangerous. But I anyway intend to use it for closed terms only and if I miss something it also won't kill me - I use this to check if I did prove something already.

notably the evar field of the Evar constructor.

Yes they are. A term in Ltac2 is like an AST with mutable nodes. Evar nodes behave like some kind of reference that can be set imperatively by unification, so if you intend to store a term with evars it might be the case that the value of the hash changes by side-effect if some unlucky unification is performed.

how does mutation in ltac2 interact wit backtrack?

if you instantiate an evar with 0, put in hash table, backtrack, instantiate with 1, what's in the table?

depends on the implementation of the hash table, but if you store the term it didn't change

physically, I mean, but obviously observing the head of term is dereferencing evars

What is unclear to me: in the `kind`

constructor `Evar (evar, constr array)`

which part changes by unification - the `evar`

part - which is an opaque type in Ltac2 so it doesn't influence my hash - or the `constr array`

part - which I currently have in the hash?

the result of the kind function changes

This is clear, but which part of the Evar constructor changes?

none

if you do

```
let k := Evar (some stuff) in
let c := Constr.make k in
unify c c';
Constr.kind c
```

it won't return `Evar (some stuff)`

, but `k`

is still bound to it

Hmm I still don't understand what you want to tell me. My assumption is that in the data structure returned by "kind" nothing changes by unification except the two arguments to an Evar constructor. Is this assumption correct?

If so, which of the two arguments of the Evar constructor change by unification? The first (of type evar), the second (of type constr array) or both?

I am not talking about terms constructed from a "kind" structure, but about repeatedly inspecting sub terms of the goal or of hypothesis.

subterms of the goal are the same as terms from `make`

if kind returns Evar on a term, future calls on the same term may return literally anything depending on unification

should keys be restricted to be evar-free?

above @Michael Soegtrop talks about “closed” terms, if that is meant to exclude evars you’re fine.

for a reusable hashmap, it still seems safer if the map rejected keys with evars — they’re morally mutable, and mutable dictionary keys indeed don’t have nice semantics.

correct me if I’m wrong, but at a high level, since an evar can become syntactically equal to pretty arbitrary terms (of the right type), and those terms can have different hashes (say, `h1`

and `h2`

), then the hash of that evar must be equal to both `h1`

and `h2`

.

which is impossible – hence a safe hash function must abort outright when encountering an evar. Which seems a cheap enough way to prevent evars in hash keys without an extra traversal?

One can of course also just ignore the Evar part and hash only the rest of the term, so that the hash is "modulo evars".

no, because when the evar gets instantiated the new hash won't know what to ignore

This is food for the ATP community. Hashes are hard because they are not invariant by substitution (evar instantiation). You can build an order relation which is stable by substitution, typically a polynomial over the holes, but that is a bit expensive. IMO here the best option is to abort if the key has holes.

If you really want to index with a hash, you can use the bloom filter trick: https://github.com/LPCIC/elpi/blob/master/ELPI.md#configurable-argument-indexing but the lookup is going to be an over approximation. (unfortunately the original source is down, so I cite myself, see the part about unification hashes)

@Enrico Tassi is the source http://blog.ndrix.com/2013/03/prolog-unification-hashes.html ? The link has a typo

shame on me

https://github.com/LPCIC/elpi/pull/121

Gaëtan Gilbert said:

no, because when the evar gets instantiated the new hash won't know what to ignore

My assumption was that when the evar gets instantiated, the reference to the evar remains - it just has a known value then. From your description I derive that this is not the case - if the evar gets instantiated, the Evar reference is replaced with its value. Is this correct?

But anyway, as I said I don't have terms with evars, so I can throw an exception if I come across one. Also this is just about optimization - worst case I prove the same thing twice (I want to use the hash to remember terms for which I already proved certain properties). As long as this doesn't happen frequently, it is not much of an issue.

@Enrico Tassi @Paolo Giarrusso : thanks for the pointers - interesting! But I don't think it is worth the effort in my case. I want to have a reasonably efficient dictionary from terms to proofs of certain properties of these terms. First it is unlikely that I can proof something about terms with evars, so that it is unlikely (but possible) that they will end up in the hash table. If they end up in the hash table, it is not likely that the evar states change while the hash table is in use. And even if they would, the worst thing which could happen is that I have to do the proof again. But I do rely on that I can reliably find hypothesis this way - here I have to ensure that the hypothesis are free of evars.

Michael Soegtrop said:

if the evar gets instantiated, the Evar reference is replaced with its value. Is this correct?

Yes, this is the abstraction/quotient that `EConstr.t`

gives you (and Ltac2 is based on that, as most of the Coq base tactics are or are being ported to).

If your keys are ground, then I think you are good. Said that, if computing the hash means traversing the whole term, then the hash table is not going to be much better that any trie/discrimination tree data structure.

@Enrico Tassi : indeed the performance of the hash table might not be good - tests will show. I thought about discrimination trees, but it would be a lot of code, because the Coq term structure is quite complicated. Btw.: is this somehow built in in Elpi? Say could I programmatically store and query rules from Ltac2 or Ltac as in usual Prolog?

Last updated: Oct 08 2024 at 14:01 UTC