Stream: Miscellaneous

Topic: Shallow vs deep embedding


view this post on Zulip Julin S (Feb 04 2022 at 05:41):

Hi.
Is there some place where we can read about the differences between shallow and deep embeddings in context of coq?
And is Higher Order Abstract Syntax (HOAS) and Parametric Higher Order Abstract Syntax (PHOAS) forms of deep embeddings or are they a different category altogether?

view this post on Zulip Théo Winterhalter (Feb 04 2022 at 06:31):

This is not in Coq but Jesper Cockx made a blog post about this kind of thing: https://jesper.sikanda.be/posts/1001-syntax-representations.html

view this post on Zulip Guillaume Melquiond (Feb 04 2022 at 06:54):

I would rather classify (P)HOAS as shallow embedding, since it heavily reuses part of the metalogic semantics, namely binders.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:15):

It's hard to find a canonical definition, but IME "deep embedding" = using syntax and "shallow embedding" = using a Church encoding (and it's almost always Church not Parigot, so it forces compositional interpretations).

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:17):

in principle that's separate from (P)HOAS, but I don't think all combinations are available in Coq. IIRC, at least Chlipala's PHOAS (and the one in the blog post) was a deep embedding.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:17):

we have a section on deep/shallow in our survey: https://arxiv.org/pdf/2003.06458.pdf#page=80

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:18):

on @Guillaume Melquiond 's point, people also distinguish "syntactic" interpreters from "meta" interpreters, depending on how much of the host language is reused — but that's not a binary since you can reuse or reimplement different pieces. Using HOAS is certainly more "meta", especially since (IIRC) it's easier to interpret using host-language closures.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:19):

I think deep/shallow is a bit like the distinction between hills and mountains. You know what a mountain is when you are on it (a fully deep encoding with de Bruijn), but it's hard to pin down a particular height where a hill becomes a mountain

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:19):

@Karl Palmskog I'm afraid I disagree with both descriptions. What you call "shallow embedding" is a pure embedding (shallow or deep)...

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:19):

and what you call "deep embedding" is just "not embedding".

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:19):

you can debate it with Ilya (Sergey) at some future POPL

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:20):

See for instance https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/embedding-short.pdf

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:22):

in the FP community, one of the standard citations for pure embeddings is https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.994.10&rep=rep1&type=pdf...

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:22):

@Karl Palmskog well, your example of deep embedding in the paper actually sounds correct, but it doesn't seem to agree with your description

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:23):

This looks good: image.png

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:24):

It doesn't seem to match this: > An alternative approach of implementing and encoding programming languages is called deep embedding, and amounts to the implementation of a DSL from scratch, essentially, writing its parser, interpreter and type-checker in a general-purpose language

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:25):

well, that is to me just going further in the meaning of deep. From the Coq inductive type of regexps, you could obviously do a parser, interpreter, etc.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:25):

I don't think we have to agree 100% with FP terminology in ITP...

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:26):

but embedded specifically means "it's embedded inside another host language"

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:26):

a lot of things are "partially" embedded in a host language these days, (P)HOAS is just one very prominent example.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:27):

I usually take the "has its own syntax datatype" as a criterion, but I know this will not work generally

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:28):

@Karl Palmskog re ITP, see https://link.springer.com/chapter/10.1007/978-3-540-30142-4_22.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:28):

In fact, that's _the_ earliest reference I ever found to the distinction re DSLs, and it comes from logics.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:28):

sure, you also have the work by Bas's team that allows switching between shallow and deep in Coq (to/from "DSL") using MetaCoq...

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:28):

(but apparently I didn't search well enough)

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:29):

I think we agree on deep = "has its own syntax datatype"

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:30):

if you want to be strict about HOL4 and HOL Light, they are themselves deeply embedded into SML and OCaml, respectively. So any language represented in their HOL is multi-deeply-embedded

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:31):

they might be deep, but they're not embedded: thankfully, "embedded DSL" is much more established as "we avoid having a separate parser".

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:32):

HOL4 has a parser, but a pretty terrible one... hence people write programmatic SML interfaces to their languages defined in HOL4

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:33):

embedded DSLs was maybe the only research focus of the group where I did my PhD, but you shouldn't take my word for it, there's lots of literature on this ( https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=embedded+DSL&btnG= ). https://dl.acm.org/doi/pdf/10.1145/1118890.1118892 for instance is a survey on DSLs.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:33):

anyway, I don't think FP should be the ultimate authority on terminology here.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:34):

do they even have any formal definitions of "DSL" by the way?

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:34):

I gave a reference from ITP, and others from outside FP.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:35):

I think it's well-established enough that the distinction between DSL and GPL (general-purpose language) is more aesthetic than mathematical

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:38):

if that's aesthetic, then I think the deep/shallow distinction is also partly aesthetic. But it can be narrowly and precisely defined, as done by Nipkow. But we don't all have to adopt that "narrowization". Can be good to do in papers, though.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:38):

you're of course free to use "point" to mean "table", so we're only discussing good writing (or whatever you want to call this)

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:39):

using "embedded" to not mean the established sense of "embedded" seems a net negative.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:40):

not that I necessarily expect an ITP person to care, it's not their research focus. But diluting terminology increases siloing...

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:41):

I'll grant that many system are in some intermediate state, and I'll happily believe HOL4 is there.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:41):

but the distinction is still useful for discussion.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:47):

if the language "HOL" as defined in OCaml via a term datatype by HOL Light, is not considered a deep embedding and a DSL of a general-purpose language, that's really strange to me...

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:49):

when the user writes a OCaml program, then they're using the deep embedding, sure.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:49):

but the usual way of interacting with HOL Light is via the OCaml toplevel, so effectively writing an OCaml program

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:50):

sure, that counts as a deep embedding.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:50):

anything outside of using their parser

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:50):

then there are some convenience mechanisms like a low-effort parser that allow you to deceive yourself into thinking you are inside the HOL system

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:50):

The distinction is useful for the Software Language Engineering community (http://www.sleconf.org) to say things like "embedding lets you reuse tooling like editor support, syntax highlighting, etc"

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:52):

IOW, the researchers working on "how to implement and support languages" distinguish between embedded languages and non-embedded ones because it's pretty important for lots of their work.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:53):

one example from HOL4:

Theorem dimindex_dimword_le_iso:
  dimindex (:'a) <= dimindex (:'b) <=> dimword (:'a) <= dimword (:'b)
Proof
 SRW_TAC [] [logrootTheory.LE_EXP_ISO, fcpTheory.dimindex_def, dimword_def]
QED

All the stuff like Theorem, Proof, Qed which look like things that are actually parsed are in the end hacky macros, and it reduces to something like:

val _ = store_thm("dimindex_dimword_le_iso", ...);

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:54):

the only connection to FP is that the FP community REALLY prefer embedding and will point out all the hacks that come out from in non-embedded languages.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:55):

OTOH, lots of work in embedding languages is exactly in more-or-less advanced tricks for making the embedded language not seem embedded.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 11:56):

(Hudak specifically excludes macro from "_pure_ embedding", and explains why, but it's certainly an embedding)

view this post on Zulip Ali Caglayan (Feb 04 2022 at 11:56):

I thought shallow vs deep was all about definitional equality. In a deep embedding you use an identity type to model definitional equality whilst in a shallow one you use the ambient definitional equality. This means various properties can get inherited from the ambient theory, such as normalization.

view this post on Zulip Karl Palmskog (Feb 04 2022 at 11:56):

so here this part: dimindex (:'a) <= dimindex (:'b) <=> dimword (:'a) <= dimword (:'b) is parsed by a hacky parser, and what the user is supposed to understand that it means, is an instance of the term datatype, which one could write directly in the general-purpose language

view this post on Zulip Ali Caglayan (Feb 04 2022 at 11:57):

The caveat being "good luck doing a shallow embedding". Typically it is very difficult to find one and/or requires a very powerful datatype.

view this post on Zulip Ali Caglayan (Feb 04 2022 at 11:58):

The best "shallow embedding" example that I know of is McBride's Kipling: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.379.3169&rep=rep1&type=pdf

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 12:05):

@Ali Caglayan I'd say Kipling uses shallow embedding for definitional equality and deep embedding for the rest.

view this post on Zulip Ali Caglayan (Feb 04 2022 at 12:06):

What about the typechecker? That uses Agda's checking AFAIK. Also what is "the rest"?

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 12:07):

hm wait... "Figure 2. A shallow embedding of KIPLING in Agda" is indeed a fully shallow embedding.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 12:08):

but later he does have:

The deep embedding of KIPLING takes the form of an indexed inductive-recursive definition, with this signature:

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 12:08):

I read this too long ago, but in the end doesn't he have a syntax for terms, but no syntax for definitional equality? That's what I was trying to say.

view this post on Zulip Paolo Giarrusso (Feb 04 2022 at 12:10):

re the typechecker, embedded languages reuse the host typechecker to some large extent — including typed syntax using indexed types, like here (or like the famous Expr T GADT examples in functional languages)

view this post on Zulip Ali Caglayan (Feb 04 2022 at 12:22):

Right, in a shallow embedding you really shouldn't have to specify a syntax for definitional equality. People above have also mentioned a similar story for binders, AFAIK HOAS is all about reusing the ambient binding mechanism, so you could argue that is in a sense shallow.

view this post on Zulip Ali Caglayan (Feb 04 2022 at 12:31):

What is interesting in dependently typed languages is that it becomes apparent that definitional equality and UIP equality are not the same in general. In order to have a definitionally reducing syntax of substitutions you need some extra rewriting in some places. I believe it is an open question how to treat those extra reductions syntactically. This is what type theory eating itself is about. Whether this is actually possible is anybodies guess.

For non-dependent languages, there is usually enough leeway to not have those extra reductions, hence why you can do GADT examples like you mentioned.

view this post on Zulip Ali Caglayan (Feb 04 2022 at 12:34):

But you can also go the other direction and rewrite along all equalities which is why you get extensional MLTT and really Set looking type theories


Last updated: Aug 19 2022 at 19:03 UTC