Shallow vs deep embedding · Miscellaneous

Hi.
Is there some place where we can read about the differences between shallow and deep embeddings in context of coq?
And is Higher Order Abstract Syntax (HOAS) and Parametric Higher Order Abstract Syntax (PHOAS) forms of deep embeddings or are they a different category altogether?

Théo Winterhalter (Feb 04 2022 at 06:31):

Guillaume Melquiond (Feb 04 2022 at 06:54):

I would rather classify (P)HOAS as shallow embedding, since it heavily reuses part of the metalogic semantics, namely binders.

Paolo Giarrusso (Feb 04 2022 at 11:15):

It's hard to find a canonical definition, but IME "deep embedding" = using syntax and "shallow embedding" = using a Church encoding (and it's almost always Church not Parigot, so it forces compositional interpretations).

Paolo Giarrusso (Feb 04 2022 at 11:17):

in principle that's separate from (P)HOAS, but I don't think all combinations are available in Coq. IIRC, at least Chlipala's PHOAS (and the one in the blog post) was a deep embedding.

Karl Palmskog (Feb 04 2022 at 11:17):

Paolo Giarrusso (Feb 04 2022 at 11:18):

on @Guillaume Melquiond 's point, people also distinguish "syntactic" interpreters from "meta" interpreters, depending on how much of the host language is reused — but that's not a binary since you can reuse or reimplement different pieces. Using HOAS is certainly more "meta", especially since (IIRC) it's easier to interpret using host-language closures.

Karl Palmskog (Feb 04 2022 at 11:19):

I think deep/shallow is a bit like the distinction between hills and mountains. You know what a mountain is when you are on it (a fully deep encoding with de Bruijn), but it's hard to pin down a particular height where a hill becomes a mountain

Paolo Giarrusso (Feb 04 2022 at 11:19):

@Karl Palmskog I'm afraid I disagree with both descriptions. What you call "shallow embedding" is a pure embedding (shallow or deep)...

Paolo Giarrusso (Feb 04 2022 at 11:19):

Karl Palmskog (Feb 04 2022 at 11:19):

Paolo Giarrusso (Feb 04 2022 at 11:20):

Paolo Giarrusso (Feb 04 2022 at 11:22):

@Karl Palmskog well, your example of deep embedding in the paper actually sounds correct, but it doesn't seem to agree with your description

Paolo Giarrusso (Feb 04 2022 at 11:23):

Paolo Giarrusso (Feb 04 2022 at 11:24):

It doesn't seem to match this: > An alternative approach of implementing and encoding programming languages is called deep embedding, and amounts to the implementation of a DSL from scratch, essentially, writing its parser, interpreter and type-checker in a general-purpose language

Karl Palmskog (Feb 04 2022 at 11:25):

well, that is to me just going further in the meaning of deep. From the Coq inductive type of regexps, you could obviously do a parser, interpreter, etc.

Karl Palmskog (Feb 04 2022 at 11:25):

Paolo Giarrusso (Feb 04 2022 at 11:26):

Karl Palmskog (Feb 04 2022 at 11:26):

a lot of things are "partially" embedded in a host language these days, (P)HOAS is just one very prominent example.

Karl Palmskog (Feb 04 2022 at 11:27):

I usually take the "has its own syntax datatype" as a criterion, but I know this will not work generally

Paolo Giarrusso (Feb 04 2022 at 11:28):

In fact, that's _the_ earliest reference I ever found to the distinction re DSLs, and it comes from logics.

Karl Palmskog (Feb 04 2022 at 11:28):

sure, you also have the work by Bas's team that allows switching between shallow and deep in Coq (to/from "DSL") using MetaCoq...

Paolo Giarrusso (Feb 04 2022 at 11:28):

Paolo Giarrusso (Feb 04 2022 at 11:29):

Karl Palmskog (Feb 04 2022 at 11:30):

if you want to be strict about HOL4 and HOL Light, they are themselves deeply embedded into SML and OCaml, respectively. So any language represented in their HOL is multi-deeply-embedded

Paolo Giarrusso (Feb 04 2022 at 11:31):

they might be deep, but they're not embedded: thankfully, "embedded DSL" is much more established as "we avoid having a separate parser".

Karl Palmskog (Feb 04 2022 at 11:32):

HOL4 has a parser, but a pretty terrible one... hence people write programmatic SML interfaces to their languages defined in HOL4

Paolo Giarrusso (Feb 04 2022 at 11:33):

Karl Palmskog (Feb 04 2022 at 11:33):

Karl Palmskog (Feb 04 2022 at 11:34):

Paolo Giarrusso (Feb 04 2022 at 11:34):

Paolo Giarrusso (Feb 04 2022 at 11:35):

I think it's well-established enough that the distinction between DSL and GPL (general-purpose language) is more aesthetic than mathematical

Karl Palmskog (Feb 04 2022 at 11:38):

if that's aesthetic, then I think the deep/shallow distinction is also partly aesthetic. But it can be narrowly and precisely defined, as done by Nipkow. But we don't all have to adopt that "narrowization". Can be good to do in papers, though.

Paolo Giarrusso (Feb 04 2022 at 11:38):

you're of course free to use "point" to mean "table", so we're only discussing good writing (or whatever you want to call this)

Paolo Giarrusso (Feb 04 2022 at 11:39):

using "embedded" to not mean the established sense of "embedded" seems a net negative.

Paolo Giarrusso (Feb 04 2022 at 11:40):

not that I necessarily expect an ITP person to care, it's not their research focus. But diluting terminology increases siloing...

Paolo Giarrusso (Feb 04 2022 at 11:41):

I'll grant that many system are in some intermediate state, and I'll happily believe HOL4 is there.

Paolo Giarrusso (Feb 04 2022 at 11:41):

Karl Palmskog (Feb 04 2022 at 11:47):

if the language "HOL" as defined in OCaml via a term datatype by HOL Light, is not considered a deep embedding and a DSL of a general-purpose language, that's really strange to me...

Paolo Giarrusso (Feb 04 2022 at 11:49):

when the user writes a OCaml program, then they're using the deep embedding, sure.

Karl Palmskog (Feb 04 2022 at 11:49):

but the usual way of interacting with HOL Light is via the OCaml toplevel, so effectively writing an OCaml program

Paolo Giarrusso (Feb 04 2022 at 11:50):

Karl Palmskog (Feb 04 2022 at 11:50):

then there are some convenience mechanisms like a low-effort parser that allow you to deceive yourself into thinking you are inside the HOL system

Paolo Giarrusso (Feb 04 2022 at 11:50):

The distinction is useful for the Software Language Engineering community (http://www.sleconf.org) to say things like "embedding lets you reuse tooling like editor support, syntax highlighting, etc"

Paolo Giarrusso (Feb 04 2022 at 11:52):

IOW, the researchers working on "how to implement and support languages" distinguish between embedded languages and non-embedded ones because it's pretty important for lots of their work.

Karl Palmskog (Feb 04 2022 at 11:53):

Theorem dimindex_dimword_le_iso:
  dimindex (:'a) <= dimindex (:'b) <=> dimword (:'a) <= dimword (:'b)
Proof
 SRW_TAC [] [logrootTheory.LE_EXP_ISO, fcpTheory.dimindex_def, dimword_def]
QED

All the stuff like Theorem, Proof, Qed which look like things that are actually parsed are in the end hacky macros, and it reduces to something like:

val _ = store_thm("dimindex_dimword_le_iso", ...);

Paolo Giarrusso (Feb 04 2022 at 11:54):

the only connection to FP is that the FP community REALLY prefer embedding and will point out all the hacks that come out from in non-embedded languages.

Paolo Giarrusso (Feb 04 2022 at 11:55):

OTOH, lots of work in embedding languages is exactly in more-or-less advanced tricks for making the embedded language not seem embedded.

Paolo Giarrusso (Feb 04 2022 at 11:56):

(Hudak specifically excludes macro from "_pure_ embedding", and explains why, but it's certainly an embedding)

Ali Caglayan (Feb 04 2022 at 11:56):

I thought shallow vs deep was all about definitional equality. In a deep embedding you use an identity type to model definitional equality whilst in a shallow one you use the ambient definitional equality. This means various properties can get inherited from the ambient theory, such as normalization.

Karl Palmskog (Feb 04 2022 at 11:56):

so here this part: dimindex (:'a) <= dimindex (:'b) <=> dimword (:'a) <= dimword (:'b) is parsed by a hacky parser, and what the user is supposed to understand that it means, is an instance of the term datatype, which one could write directly in the general-purpose language

Ali Caglayan (Feb 04 2022 at 11:57):

The caveat being "good luck doing a shallow embedding". Typically it is very difficult to find one and/or requires a very powerful datatype.

Ali Caglayan (Feb 04 2022 at 11:58):

Paolo Giarrusso (Feb 04 2022 at 12:05):

@Ali Caglayan I'd say Kipling uses shallow embedding for definitional equality and deep embedding for the rest.

Ali Caglayan (Feb 04 2022 at 12:06):

What about the typechecker? That uses Agda's checking AFAIK. Also what is "the rest"?

Paolo Giarrusso (Feb 04 2022 at 12:07):

hm wait... "Figure 2. A shallow embedding of KIPLING in Agda" is indeed a fully shallow embedding.

Paolo Giarrusso (Feb 04 2022 at 12:08):

I read this too long ago, but in the end doesn't he have a syntax for terms, but no syntax for definitional equality? That's what I was trying to say.

Paolo Giarrusso (Feb 04 2022 at 12:10):

re the typechecker, embedded languages reuse the host typechecker to some large extent — including typed syntax using indexed types, like here (or like the famous Expr T GADT examples in functional languages)

Ali Caglayan (Feb 04 2022 at 12:22):

Right, in a shallow embedding you really shouldn't have to specify a syntax for definitional equality. People above have also mentioned a similar story for binders, AFAIK HOAS is all about reusing the ambient binding mechanism, so you could argue that is in a sense shallow.

Ali Caglayan (Feb 04 2022 at 12:31):

What is interesting in dependently typed languages is that it becomes apparent that definitional equality and UIP equality are not the same in general. In order to have a definitionally reducing syntax of substitutions you need some extra rewriting in some places. I believe it is an open question how to treat those extra reductions syntactically. This is what type theory eating itself is about. Whether this is actually possible is anybodies guess.

For non-dependent languages, there is usually enough leeway to not have those extra reductions, hence why you can do GADT examples like you mentioned.

Ali Caglayan (Feb 04 2022 at 12:34):

But you can also go the other direction and rewrite along all equalities which is why you get extensional MLTT and really Set looking type theories

Stream: Miscellaneous

Topic: Shallow vs deep embedding

Julin S (Feb 04 2022 at 05:41):

Théo Winterhalter (Feb 04 2022 at 06:31):

Guillaume Melquiond (Feb 04 2022 at 06:54):

Paolo Giarrusso (Feb 04 2022 at 11:15):

Paolo Giarrusso (Feb 04 2022 at 11:17):

Karl Palmskog (Feb 04 2022 at 11:17):

Paolo Giarrusso (Feb 04 2022 at 11:18):

Karl Palmskog (Feb 04 2022 at 11:19):

Paolo Giarrusso (Feb 04 2022 at 11:19):

Paolo Giarrusso (Feb 04 2022 at 11:19):

Karl Palmskog (Feb 04 2022 at 11:19):

Paolo Giarrusso (Feb 04 2022 at 11:20):

Paolo Giarrusso (Feb 04 2022 at 11:22):

Paolo Giarrusso (Feb 04 2022 at 11:22):

Paolo Giarrusso (Feb 04 2022 at 11:23):

Paolo Giarrusso (Feb 04 2022 at 11:24):

Karl Palmskog (Feb 04 2022 at 11:25):

Karl Palmskog (Feb 04 2022 at 11:25):

Paolo Giarrusso (Feb 04 2022 at 11:26):

Karl Palmskog (Feb 04 2022 at 11:26):

Karl Palmskog (Feb 04 2022 at 11:27):

Paolo Giarrusso (Feb 04 2022 at 11:28):

Paolo Giarrusso (Feb 04 2022 at 11:28):

Karl Palmskog (Feb 04 2022 at 11:28):

Paolo Giarrusso (Feb 04 2022 at 11:28):

Paolo Giarrusso (Feb 04 2022 at 11:29):

Karl Palmskog (Feb 04 2022 at 11:30):

Paolo Giarrusso (Feb 04 2022 at 11:31):

Karl Palmskog (Feb 04 2022 at 11:32):

Paolo Giarrusso (Feb 04 2022 at 11:33):

Karl Palmskog (Feb 04 2022 at 11:33):

Karl Palmskog (Feb 04 2022 at 11:34):

Paolo Giarrusso (Feb 04 2022 at 11:34):

Paolo Giarrusso (Feb 04 2022 at 11:35):

Karl Palmskog (Feb 04 2022 at 11:38):

Paolo Giarrusso (Feb 04 2022 at 11:38):

Paolo Giarrusso (Feb 04 2022 at 11:39):

Paolo Giarrusso (Feb 04 2022 at 11:40):

Paolo Giarrusso (Feb 04 2022 at 11:41):

Paolo Giarrusso (Feb 04 2022 at 11:41):

Karl Palmskog (Feb 04 2022 at 11:47):

Paolo Giarrusso (Feb 04 2022 at 11:49):

Karl Palmskog (Feb 04 2022 at 11:49):

Paolo Giarrusso (Feb 04 2022 at 11:50):

Paolo Giarrusso (Feb 04 2022 at 11:50):

Karl Palmskog (Feb 04 2022 at 11:50):

Paolo Giarrusso (Feb 04 2022 at 11:50):

Paolo Giarrusso (Feb 04 2022 at 11:52):

Karl Palmskog (Feb 04 2022 at 11:53):

Paolo Giarrusso (Feb 04 2022 at 11:54):

Paolo Giarrusso (Feb 04 2022 at 11:55):

Paolo Giarrusso (Feb 04 2022 at 11:56):

Ali Caglayan (Feb 04 2022 at 11:56):

Karl Palmskog (Feb 04 2022 at 11:56):

Ali Caglayan (Feb 04 2022 at 11:57):

Ali Caglayan (Feb 04 2022 at 11:58):

Paolo Giarrusso (Feb 04 2022 at 12:05):

Ali Caglayan (Feb 04 2022 at 12:06):

Paolo Giarrusso (Feb 04 2022 at 12:07):

Paolo Giarrusso (Feb 04 2022 at 12:08):

Paolo Giarrusso (Feb 04 2022 at 12:08):

Paolo Giarrusso (Feb 04 2022 at 12:10):

Ali Caglayan (Feb 04 2022 at 12:22):

Ali Caglayan (Feb 04 2022 at 12:31):

Ali Caglayan (Feb 04 2022 at 12:34):