Stream: Coq users

Topic: Type classes vs canonical structures vs modules


view this post on Zulip Karl Palmskog (May 20 2020 at 12:02):

ah, should we do another type classes vs. canonical structures thread? In my opinion, FMap/MSet is the current extraction and computing-in-Coq winner, and finmap is the ease-of-reasoning winner...

view this post on Zulip Paolo Giarrusso (May 20 2020 at 12:10):

Rather than rehash a debate — does the use of canonical structures/type classes/modules affect the extraction result significantly?

view this post on Zulip Paolo Giarrusso (May 20 2020 at 12:11):

My prior is that the two things are independent — except that functor applications might be monomorphized during extraction. Are they?

view this post on Zulip Karl Palmskog (May 20 2020 at 12:14):

my experience is that type classes makes extracted code into an unreadable mess, with lots of indirection. FMap/MSet was designed to extract well from the start. canonical structures extracts terribly with lots of nesting, but this could be addressed as done here: https://doi.org/10.1016/j.scico.2019.102372

view this post on Zulip Paolo Giarrusso (May 20 2020 at 15:01):

FWIW, here's a pointer on stdpp's potential performance issues: https://gitlab.mpi-sws.org/iris/stdpp/-/issues/64. To be clear, I find them extremely well-designed for proofs (which is all I've actually needed).

view this post on Zulip Bas Spitters (May 20 2020 at 15:18):

I'm still curious to know whether the lean4 type classes would work well in Coq. The authors claim they would.

view this post on Zulip Karl Palmskog (May 20 2020 at 15:23):

I would have liked to do some kind of empirical study on modules vs. type classes vs. canonical structures, looking at everything from productivity, performance of extracted code, proof search time, extensibility cost, etc. I feel the topic is full of high-level contradicting claims and anecdotes without much data.

view this post on Zulip Karl Palmskog (May 20 2020 at 15:26):

Talia and I wrote up some questions for Derek Dreyer on what he thought people should use, but his answers were quite vague -- from my recollection he didn't think there was a clear winner (even though he has published so much on all of modules/CS/TC)

view this post on Zulip Karl Palmskog (May 20 2020 at 15:37):

I would probably start with the most simple problem: data mining on Coq dataset to see what people actually use and how. Could start with recording "Module Type" vs. "Canonical" vs. "Instance"/"Class".

view this post on Zulip Karl Palmskog (May 20 2020 at 15:40):

Milos Gligoric and I have a related NSF proposal under submission, but regrettably it doesn't seem like it will be funded (should have heard by now). But if anyone is in a position to send in proposals along these lines to relevant funding agencies I'd be happy to contribute -- at least a student or two doing engineering work would be necessary for scaling.

view this post on Zulip Bas Spitters (May 21 2020 at 07:12):

@Karl Palmskog Someone (@Maxime Dénès ?) has done an experiment with reimplementing math-comp using type classes.
I believe it's in the wiki somewhere. The experiment should be reevaluated with the new lean4 type class algorithm.

view this post on Zulip Bas Spitters (May 21 2020 at 07:22):

Here's the experiment:
https://github.com/coq/stdlib2/wiki/EqClass
https://github.com/coq/coq/wiki/Typeclasses-and-Canonical-Structures
Also (resonates with modules in agda)
https://github.com/coq/coq/wiki/RecordsNotModules

view this post on Zulip Karl Palmskog (May 21 2020 at 08:22):

if I remember correctly Vincent Laporte worked on the experiment as well, and it was presented in a Coq working group

view this post on Zulip Maxime Dénès (May 21 2020 at 09:25):

Yes, the work was done by Vincent.

view this post on Zulip Enrico Tassi (May 21 2020 at 13:21):

Unfortunately the data is going to be hard to analyze since Modules in Coq play many roles.

You can't seal a type without a module. You can't have 2 constants called the same if you don't put them in different modules. You can't limit the scope a thing (notation, TC instance...) if you don't put it in a module. In Math Comp modules are use for almost everything but abstracting/reusing code, which is my understanding of the purpose of modules in PL.

Also Coq modules were designed around (read copied from) OCaml modules, exactly to have a simple (one to one) extraction procedure. It is surely a plus if this is what you want, but it is not really a feature of "modules as a concept in PL". If you take a module system that does not trivially map to OCaml you quickly lose this (or you extract to Haskell or ...). I mean, if we had a magic wand the module system in Coq would be quite different. Eg. the dichotomy between ADT and opaque types, annoying in PL hence the translucent thing, is even more problematic in type theory where begin able to see the AST mean you can compute and hence type check more terms. When modules were added the grand plan was to later add computation rules to module types, so that one could seal a type but keep some computations. It never happened, there is some theory about CC + rewrite rules and some prototypes like CoqMT, but nothing landed in the module system. I'm not aware or any work on any work on CC + rewrite rules and extraction. So, if we had the module system we want, I've no idea if extraction would just wokr for it.

My2C

view this post on Zulip Kenji Maillard (May 21 2020 at 13:26):

Do you know if anything was written (even informally) on that idea of adding reductions rules to modules (to interfaces in particular I guess) ? It sounds like a rather natural idea for modularity in CIC and I have been playing around with rewrite rules in Agda doing something similar.

view this post on Zulip Enrico Tassi (May 21 2020 at 13:28):

I'm not aware. Let's summon @Hugo Herbelin , I'm sure he knows more.

view this post on Zulip Bas Spitters (May 21 2020 at 14:01):

The theory of Coq modules was developed in Judicaël Courant's PhD-thesis. I don't recall rewrite rules in there, but it's been a very long time since I looked.

view this post on Zulip Hugo Herbelin (May 21 2020 at 14:51):

Adding reduction rules with modules was considered/dreamt at some time. In addition to Judicaël, typical people interested at that time (late 90's, early 00's) were Jean-Christophe, Jacek Chrąszcz, Frédéric Blanqui, Daria Walukiewicz (see e.g. https://arxiv.org/abs/0806.1749), later also Florent Kirchner and Olivier Hermant, iirc. It comes with interesting but difficult problems though (how to ensure confluence/termination? do we restrict to declaring only beta-like rules? how to connect the implementation and the interface when what is definitional in the implementation is not in the interface and vice-versa?).
There is indeed also the work on CC+ rewrite rules by Blanqui-Fernandez-Jouannaud, later Strub, on the calculus of algebraic constructions (and probably other).

view this post on Zulip Karl Palmskog (May 21 2020 at 17:26):

I looked into all publications related to Coq modules since we were interested in it for trustworthiness reasons, here is what I found in chronological publication order (did not include Courant's thesis since it seems superseded by the JFP article in 2007):

@InProceedings{Chrzaszcz2003,
author="Chrz{\k{a}}szcz, Jacek",
LONGeditor="Basin, David
and Wolff, Burkhart",
title="Implementing Modules in the {Coq} System",
booktitle="Theorem Proving in Higher Order Logics",
year="2003",
LONGpublisher="Springer Berlin Heidelberg",
LONGaddress="Berlin, Heidelberg",
pages="270--286",
LONGisbn="978-3-540-45130-3",
LONGdoi="10.1007/10930755_18"
}

@InProceedings{Chrzaszcz2004,
author="Chrz{\k{a}}szcz, Jacek",
LONGeditor="Berardi, Stefano
and Coppo, Mario
and Damiani, Ferruccio",
title="Modules in {Coq} Are and Will Be Correct",
booktitle=TYPES,
year="2004",
LONGpublisher="Springer Berlin Heidelberg",
LONGaddress="Berlin, Heidelberg",
pages="130--146",
LONGisbn="978-3-540-24849-1",
LONGdoi="10.1007/978-3-540-24849-1_9",
}

@article{Courant2007,
  title={A module calculus for Pure Type Systems},
  volume={17},
  LONGDOI={10.1017/S0956796806005867},
  number={3},
  journal={Journal of Functional Programming},
  LONGpublisher={Cambridge University Press},
  author={Courant, Judica{\"e}l},
  year={2007},
  pages={287--352}
}

@inproceedings{Soubiran2009,
 author = {Soubiran, Elie},
 title = {A Unified Framework and a Transparent Name-space for the {Coq} Module System},
 booktitle = {Workshop on Modules and Libraries for Proof Assistants},
 LONGseries = {MLPA '09},
 year = {2009},
 LONGisbn = {978-1-60558-954-1},
 LONGlocation = {Montreal, Canada},
 pages = {38--45},
 LONGurl = {http://doi.acm.org/10.1145/1735813.1735820},
 LONGdoi = {10.1145/1735813.1735820},
 LONGpublisher = {ACM},
 LONGaddress = {New York, NY, USA},
}

@phdthesis{Soubiran2010,
  TITLE = {{Modular development of theories and name-space management for the Coq proof assistant}},
  AUTHOR = {Soubiran, Elie},
  LONGURL = {https://tel.archives-ouvertes.fr/tel-00679201},
  SCHOOL = {{Ecole Polytechnique X}},
  YEAR = {2010},
  LONGMONTH = Sep,
  LONGPDF = {https://tel.archives-ouvertes.fr/tel-00679201/file/these.pdf},
}

view this post on Zulip Ralf Jung (May 22 2020 at 07:36):

on the topic of type classes vs canonical structures, we are seeing some *enormous* perf wins (40% for entire developments, 70% for individual files) by merging two layers of our canonical-structure-based hierarchy. So there are definitely some subtle perf cliffs not just with type classes.

view this post on Zulip Notification Bot (May 22 2020 at 10:58):

This topic was moved by Théo Zimmermann to #Hierarchy Builder devs & users > Hierarchy builders and type classes

view this post on Zulip Christian Doczkal (May 22 2020 at 12:47):

IIRC, the problem with deep hierarchies is well-known to the developers of the mathcomp libraries (e.g, it's one of the reasons why one cannot simply insert a semigroup structure into the current hierarchy, even though semigroups occur pervasively in CS). My understanding is that fixing this would significantly increase the (already significant) boilerplate required. Hence, the development of the Hierarchy Builder tool to automate this. @Enrico Tassi , @Cyril Cohen , or @Kazuhiko Sakaguchi can probably elaborate on how the HB generated hierarchies differ from the current mathcomp hierarchy.

view this post on Zulip Enrico Tassi (May 22 2020 at 12:51):

The "depth" of the hierarchy matters for perf. HB should solve some of the problems that come from that. In particular the "class" projection chain is of constant length with HB, while it is proportional to the hierarchy depth is MC. Indeed to not copy paste an existing class you mention it in the subclass class record when you do things by hand (generating a sort of telescope), while HB can copy its contents (being a program he does not get bored)

view this post on Zulip Enrico Tassi (May 22 2020 at 12:53):

We are not too far from being able to test HB on the full MC. Only then we will be able to confirm or not this claim.

view this post on Zulip Kazuhiko Sakaguchi (May 23 2020 at 05:21):

Karl Palmskog said:

my experience is that type classes makes extracted code into an unreadable mess, with lots of indirection. FMap/MSet was designed to extract well from the start. canonical structures extracts terribly with lots of nesting, but this could be addressed as done here: https://doi.org/10.1016/j.scico.2019.102372

Thanks for mentioning my paper. In my opinion, nesting itself comes from (semi- or fully) bundled approaches to encode mathematical structures such as packed classes. But we can do that with type classes as in Lean's mathlib and will face the same nesting issue. Another issue specific to combining canonical structures and extraction is type coercion such as Obj.magic to convert between a concrete type T and sort T_instance which are convertible in dependent type theory but not the same after extraction (see Sect. 8.3 of my paper).

view this post on Zulip Bas Spitters (May 23 2020 at 09:57):

There is also this work in agda that seems to have some overlapping goals with the hierarchy builder. It uses meta-programming (e-lisp) to convert between bundled and unbundled structures.
https://alhassy.github.io/next-700-module-systems/

view this post on Zulip Kenji Maillard (May 23 2020 at 10:11):

speaking about bundled vs unbundled, the system designed in arend looks quite impressive at first sight (records/typeclasses are "bundled by default" but you can unbundle in a really lightweight way just by specifying some of the fields, for instance ring stands for an arbitrary ring instance ring int to an arbitrary ring structure on integers and ring int + to an arbitrary ring structure on int such that the first operation is +). I wonder how unification & type class resolution is implemented and how tracktable it gets with a larger development...

view this post on Zulip Bas Spitters (May 23 2020 at 10:17):

I agree. I've talked to Isaev about this. Unfortunately, there's no paper explaining the implementation.

view this post on Zulip Cyril Cohen (May 24 2020 at 19:38):

Kenji Maillard said:

speaking about bundled vs unbundled, the system designed in arend looks quite impressive at first sight (records/typeclasses are "bundled by default" but you can unbundle in a really lightweight way just by specifying some of the fields, for instance ring stands for an arbitrary ring instance ring int to an arbitrary ring structure on integers and ring int + to an arbitrary ring structure on int such that the first operation is +). I wonder how unification & type class resolution is implemented and how tracktable it gets with a larger development...

Hi, can you provide a reference for that?

view this post on Zulip Karl Palmskog (May 24 2020 at 19:42):

I think Kenji is talking about this? https://arend-lang.github.io/about/arend-features#class-system -- but as Bas says there is no formal writeup, it's implementation-defined (in Java).

view this post on Zulip Valentin Robert (May 26 2020 at 08:53):

Hey everyone, IIRC from reading the canonical structures paper, one of the differences highlighted there was that type class resolution was type-driven, while canonical structure resolution was essentially term-driven (if not by intent, in practice), and the paper used it to guide the resolution process by creating opaque type aliases that would either dictate an instance to be used, or be peeled open to reveal the next one.
Is this still current? The paper made it sound like canonical structures were more powerful in this sense, though the details of getting this to work seemed excruciating.

view this post on Zulip Janno (May 26 2020 at 10:40):

This sounds correct to me. Canonical structures, as far as I know, have not changed at all.

view this post on Zulip Enrico Tassi (May 26 2020 at 11:42):

Both are term driven, in the sense that types can contain terms in Coq. One may say that CS resolution is type triggered in the sense that CS resolution only happens when two types are unified, then recursively unification can compare terms and trigger CS there.

What may make one think that TC are type driven is that Check _ : SomeClass key is enough to trigger a search for a TC instance for SomeClass driven by key, but key can really be anything, terms included. With CS it is a bit harder. In MathComp you have a bunch of notations like [eqType of key] for that. Another way is to write Check refl_equal _ : proj _ = key, which is not much more verbose but may be a bit more obscure than the TC counterpart, in particular one cannot just write a type, but also a term (refl_equal _ there). Anyway, in this case it is clear that key can be anything (a term or a type, since = accepts both)

view this post on Zulip Enrico Tassi (May 26 2020 at 11:47):

@Valentin Robert I'm not sure I get which paper you are talking about. If it is the one of Beta et al, then the chain of aliases is transparent and unification will unfold that step by step, trying to find a CS solution for proj _ = key1, then proj _ = key2 (assuming Definition key1 := key2). This trick (by Georges) lets one retrofit some backtracking in CS search, which is crucial in some applications, like reification in presence of variables or exploring a tree as in their paper (the lemma overloading thing).

view this post on Zulip Enrico Tassi (May 26 2020 at 11:54):

Hope it helps

view this post on Zulip Janno (May 26 2020 at 11:56):

I guess one crucial difference to mention here for the sake of completeness is that TC are open in the sense that new instances can be added at any time. CS are closed: there usually is one canonical instance that points Coq to key1 and that instance cannot be changed after the fact. (I think @Gaëtan Gilbert has been experimenting with a design where new instances override old ones but I don't know how that is going.)

view this post on Zulip Kazuhiko Sakaguchi (Mar 05 2021 at 18:35):

Today I read the CICM '20 paper about the linter of mathlib, Maintaining a Library of Formal Mathematics, and discovered there many interesting issues with type classes in mathlib (see §3.3). In particular, they explain that they have to unbundle a structure parameterized by another structure such as modules over a ring in order to avoid some kind of inference issues. It looks a bit ad hoc to me, and I do not understand what would be the consequences of the extensive use of this methodology.

AFAIK, we never experienced these issues with canonical structures in MathComp. (Indeed there are other kinds of issues with CS.) I think these examples would be helpful to compare type classes and canonical structures.


Last updated: Jan 27 2023 at 00:03 UTC