Unification hints and canonical structures · Coq users

Enrico might hop in and say how I'm wrong, but I always viewed "unification hints" as the concept, and CS as the "language mechanism", i.e., how unification hints are exposed to users in Coq

Karl Palmskog (Sep 17 2020 at 21:36):

indeed, I think the Lean developers decided to drop unification hint support in Lean 4, which will make it impossible to use CS in lean

Paolo Giarrusso (Sep 17 2020 at 21:36):

Okay, hm, I should read the paper someday, sadly my queue is full for a while.

Karl Palmskog (Sep 17 2020 at 21:37):

now we went OT again, I will do another breakout of topics

Notification Bot (Sep 18 2020 at 10:59):

This topic was moved here from #Dune devs & users > Unification hints and canonical structures by Théo Zimmermann

Enrico Tassi (Sep 18 2020 at 13:48):

When I wrote the Unification Hints paper, Canonical Structures were undocumented (but already implemented). So I often call the idea of extending unification with rules "Unification Hints", and CS the way it is implemented in Coq. Fun fact, if you read the PhD of Saibi, he realizes CS resolution "can iterate" in a footnote of an example: everything was there, he just did not gave this detail enough importance (and he wrote no documentation to his code).
Matita implemented UH as in the paper. Lean 3 as well (but they did not document the syntax to declare hints, so you need really to look hard to find them).

10 years later I'd say that Unification Hints are just Prolog clauses for the unif predicate: a Prolog predicate implementing the unification of two Coq terms (not a Coq predicate). You add these clauses in order to make the unification algorithm use proved results as solutions to hard unification problems: you make it smarter and aware of what you proved. The UH paper gives a criteria for these clauses to be acceptable. CS apply a much stricter criteria, which makes unif behave as a function (1 canonical solution) while in principle it could be a relation (it could compute many results).
With these looking glasses Type Classes are not very different. An instance is just an extra clause for the inhab predicate, which implements proof search (inhabitation in dependent type theory). Again you feed inhab with facts that you proved.

So now you may realize why Elpi looks like that, and why Coq-Elpi's repo contains a (not yet finished) unifier for Coq terms ;-)

Karl Palmskog (Sep 18 2020 at 13:52):

@Enrico Tassi so the bottom line is that it's not too inaccurate if I go on saying what I said above, i.e., that UH is the theoretical concept and CS are the realization/interface to UH for users? Or do you recommend some other one-liner?

Enrico Tassi (Sep 18 2020 at 13:53):

I'd say it's OK. The only glitch is that the implementation of CS came before, which is somewhat surprising.

Karl Palmskog (Sep 18 2020 at 13:54):

hardly the first time in computer science...

Bas Spitters (Sep 18 2020 at 13:55):

Could elpi be used to implement the lean improvement of type class inference?

Enrico Tassi (Sep 18 2020 at 13:56):

I would be very happy to be able, one day, to use it to implement Coq's one ;-)

Karl Palmskog (Sep 18 2020 at 13:57):

productivity-wise, I'm not sure better type class inference automation is a good thing. All that may happen is that developers double or triple the complexity of their type class hierarchies

Enrico Tassi (Sep 18 2020 at 13:57):

Technologically speaking it is not easy. Elpi is written in OCaml, so it is (by design) easy to embed it into an application written in OCaml, but Lean is not. Just think at the complexity of making 2 garbage collectors co-exist.

Karl Palmskog (Sep 18 2020 at 13:59):

for example, do we even have good guidelines for typeclass hierarchy design accessible to general users? I'm not sure if people grasp the bundled-unbundled thing, etc.

Karl Palmskog (Sep 18 2020 at 14:01):

a typeclass version of Hierarchy Builder would probably help. But just the existence of HB for CS-based hierarchies is somehow reassuring (someone has at least thought about these things and I don't have to read 10+ papers anymore to figure out all the heuristics)

Bas Spitters (Sep 18 2020 at 14:04):

+1 on HB for TC

Enrico Tassi (Sep 18 2020 at 14:07):

The question about implementing TC resolution is not new to my mind. In particular the recent paper about TC resolution caching in Lean is really about a well studied Prolog technique to mitigate the problems of depth-first search called Tabling. And I've a pile of papers on my desk, since Elpi's runtime does not have tabling yet. Applying tabling to the inhab predicate would be just a use case for it.

Karl Palmskog (Sep 18 2020 at 14:10):

@Enrico Tassi is there an obvious intuition why augmenting unif appears to lead to much more predictable results than augmenting inhab? In my early use of Coq, I ran into infinite typeclass resolution searches all the time, yet I've never had that issue with CS, just plain immediate failures. Is it due to that CS are such a restricted version of UH?

Karl Palmskog (Sep 18 2020 at 14:16):

maybe one can restrict typeclass definitions or inhab searches in some way to get the predictability of CS?

Enrico Tassi (Sep 18 2020 at 14:47):

Yes, CS terminate by design. Let me try to elaborate via the looking glasses of Logic Programming.

Divergence can happen for two reasons: generative-behavior and complexity (it's easy to be exponential with backtracking).
The latter is typically mitigated by using ! (the cut operator) in order to avoid exploring branches that are doomed. TC in Coq don't have a proper cut operator (there is a stalled PR, CC @Cyril Cohen ). Tabling, if available, can also be of great help here.

The generative behavior has to be mitigated by declaring modes for your predicate (Modes for Classes parameters in Coq's jargon).
The problem is that, by default, predicate arguments are input and output at the same time in Logic Programming. (I don't like this choice; Elpi encourages you to declare modes; Coq TC don't; Lean does pick input by default).
Take an innocuous rule: inhab (A * B) P :- inhab A P1, inhab B P2, P = (conj P1 P2).. If you apply it to inhab (nat * bool) P it generates two simpler problems inhab nat P1 and inhab bool P2 and you are happy.
But if you apply it, even by mistake, to inhab X P where X is unknown (eg an evar), then the rule applies by setting X := (A * B) (as if X was an output) and the first sub problem is inhab A P1, which is the same thing as inhab X P so you are looping (the same rule applies over and over).
My intuition here is that the first argument of inhab plays the role of an input that you recurse on, if it is not there you should not try generating it. Another way to put it: generate and test only works if you have an generation strategy that is not silly (not depth first).

CS only let you declare rules that have the form unif (proj X) (constant Args) :- ... and generate sub problems that are syntactically smaller. If proj or constant are not there, CS resolution is not triggered, you get an error. And you can only declare one rule per constant, so no backtracking is needed (hence no cut, resolution is linear in the size of the RHS of unif).

I believe your intuition is correct in the sense that there are for sure saner defaults (eg default mode to input, as Lean does).
Still there are use cases for having arguments in output mode. Eg if 1 + 0 is only made of overloaded TC operators, and there is no type annotation, then there is no input to follow, but you may still be happy to see Coq pick the most recently declared instance (which is likely to be a fact, maybe a section variable, not a rule with premises like the one for pairs in my example above). I'm not a big fan of this, but I've seen it used in the wild (even in math-classes IIRC), and I can see it can be convenient.
Having overlapping instances and still a decent complexity is not trivial to my eyes. In the sense that ! is not very modular, you have to look at the whole set of clauses and their precedence in order to tell if you are only pruning doomed branches, or potentially good ones. So the choice of using a cache (the Tabling thing) is much more user friendly.

Karl Palmskog (Sep 18 2020 at 14:58):

thanks, even though it may not be the intention, I take this as reinforcement of my belief that inexperienced Coq users should in most cases not touch type classes at all except to tap into what is already in (some parts of) Stdlib or Std++.

Cyril Cohen (Sep 18 2020 at 15:00):

Indeed, even expert Coq typeclass users can run very quickly into trouble by forgetting modes, oppacity directive and introducing loops (or simply exponential behaviours)...

Karl Palmskog (Sep 18 2020 at 15:03):

I wasn't even much aware about the input/output (mode) distinction and how that is different for TC in Lean. Ping @Rob Lewis is this widely known among Lean users? And does Lean have cut (!) for type class search?

Rob Lewis (Sep 18 2020 at 15:36):

out_params (as Lean calls the gadget to distinguish from the default kind of type class parameter) are known to more advanced users and people who came across a related problem, got confused, and asked something on the Zulip chat. I'd guess that most beginners don't know they exist, and people can get very far without needing to understand them.

Rob Lewis (Sep 18 2020 at 15:37):

There's no cut as far as I'm aware.

Paolo Giarrusso (Sep 18 2020 at 18:58):

There are absolutely use-cases for output modes! But I agree that they're a dangerous default.

Paolo Giarrusso (Sep 18 2020 at 18:59):

Together with non-unique solutions, they conspire to make things unpredictable.

Paolo Giarrusso (Sep 18 2020 at 19:00):

You can use typeclasses robustly, but you just need to enforce some criteria on your rules without any computer assistance (declare modes, make Instances syntax-directed).

Paolo Giarrusso (Sep 18 2020 at 19:01):

Or at least, @Robbert Krebbers can.

Paolo Giarrusso (Sep 18 2020 at 19:02):

And of course opacity. "Syntax-directed" indeed becomes interesting otherwise :-|

Paolo Giarrusso (Sep 18 2020 at 19:03):

(there are also multiple complaints about stdlib typeclasses for setoid rewriting, so at least someone disagrees they're robust; never looked into that myself)

Gaëtan Gilbert (Sep 18 2020 at 19:16):

Does tabling impose any constraints on the typeclass instances? specifically is it compatible with Hint Extern where we run arbitrary tactics as typeclass hints?

Enrico Tassi (Sep 19 2020 at 05:38):

Good question. Tabling changes the resolution strategy, that is how you explore the tree. In principle you can see rules as black boxes, you only look at the subgoals they produce. Wrt hint extern as we know it, the funny thing is that the sometimes call the resolution procedure explicitly, which is maybe problematic (they should be forbidden to do that, maybe). I still have to fully understand the pile of papers, but I don't see a show stopper.

Simon Hudon (Sep 19 2020 at 15:07):

The Lean type class system used to allow tactic execution during resolution but that feature has been removed in Lean 4, not because of the tabling but because Lean 4 is written in Lean and the elaborator, type class system and the check for definitional equality are mutually recursive and removing the feature allowed a less intertwined design for those parts. I think that supports the idea that tabling is not incompatible with running tactics during instance resolution.

Karl Palmskog (Sep 19 2020 at 15:10):

@Simon Hudon can you confirm that unification hints are also gone in Lean 4?

Simon Hudon (Sep 19 2020 at 15:23):

I think they are

Simon Hudon (Sep 20 2020 at 01:22):

Now that I think of it, a lot of it, (e.g. the top level parser and the elaborator) will be readily extendible so we should be able to add it in ourself

Simon Hudon (Sep 20 2020 at 01:24):

Right now, we can already tag a definition with elab_as_eliminator so that the elaborator will use a different mode to elaborate it. The modes should be more customizable in Lean 4

Last updated: Apr 20 2024 at 12:02 UTC