The 2pp certicoq abstract states that the GC is simple. Is any more information available?
this is related, I think: https://zoep.github.io/icfp2019.pdf
From what I can remember from a talk from Aquinas Hobor at gallium one or two years ago, their GC uses the same strategy as OCaml's GC for the minor heap: bump pointer allocation, then when it is full, do a linear pass and move live data to the next generation. Except that they have 12 generations (instead of 2 in the case of ocaml), all using the same simple strategy. The catch being: if you end up trying to garbage collect the last generation, then your program crashes.
https://github.com/PrincetonUniversity/certicoq/blob/master/theories/Runtime/gc.c#L351
If I understand correctly, because pointers can only point to older generations, I think that would mean that long running programs have a chance of eventually crashing? Whereas a more realistic GC would have a better allocator for generation number N.
hmm, that sounds worrying, so GC crash becomes the "replacement" for divergence for regular programming languages? ("program either GC-crashes or terminates")
I'm not even sure how GC correctness is phrased these days, I guess I should take a look at the CakeML GC paper again: http://www.cse.chalmers.se/~myreen/itp17.pdf
@Armaël Guéneau the CoqPL abstract states that "The certicoq garbage collector is simple, as Coq is purely functional (unlike haskell, ocaml)", which suggests that they are using a simpler GC than ocaml.
but that was 3 years ago now, right? Approach may have changed
Yes, that's why I'm asking...
ok, then it might be referring to the fact that they don't have to deal with mutation (e.g. ocaml has a write barrier; the GC needs to be aware that the graph of objects has changed when mutating objects).
so I guess that simplifies the design of the GC indeed (independently from the generations business)
This paper seems to contain the 400line GC for certicoq:
https://dl.acm.org/doi/pdf/10.1145/3360597
The talk by Aquinas that I saw was indeed about the work presented in that paper.
I haven't studied CertiCoq's GC, but what @Armaël Guéneau describes makes some sense. Using 2 generations for simplicity, without mutation, an "old" generation cannot point to a "newer" one, but mutating the old generation breaks this invariant. Write barriers instrument writes to track pointers from older generations; those pointers must be used as additional roots when doing GC on a new generation.
The paper has few details — they point to an extended version, but the link gives 404: https://www.comp.nus.edu.sg/~hobor/Publications/2019/autoquack_extended_oopsla19.pdf — while the homepage is still live (and still hosts the OOPSLA paper: https://www.comp.nus.edu.sg/~hobor/Publications/2019/Localize.pdf)
On Hobor's homepage I only found these slides for more details: https://www.comp.nus.edu.sg/~hobor/Publications/2019/gc_NIER_talk.pdf
Let me comment that, if what @Armaël Guéneau describes is true, I'm not convinced that this GC strategy is as good as they claim:
The 12-generation GC, written in the spirit of the OCaml GC, is realistic and sophisticated, though by no means industrial-strength.
Interestingly the ICFP paper and the Oopsla paper do not cite each other. Are any of the certicoq authors on Zulip?
@Zoe Paraskevopoulou (Gitter import) @Abhishek Anand @Randy Pollack @Matthieu Sozeau ??
I'm not very familiar with the GC implementation but I don't think it drastically changed from what Aquinas presented at Oopsla
Andrew Appel would be the best placed to answer this question, I'll point him here :)
What exactly is the question here?
I should also remark: The CertiCoq garbage collector was implemented by me, and proved correct by Shengyi Wang, the lead author of the OOPLSA paper, Aquinas Hobor's PhD student, and now Shengyi is a postdoc at Princeton (though he's working on other things than the garbage collector). Indeed, the garbage collector has 12 generations, each twice the size of the previous, and I dare you to run out of space! But it's true, if we wrote a few more lines of code, we could handle "bottoming out" at the biggest generation more comprehensively. Also it's true that this collector doesn't need to handle mutable references, which simplifies the finding of roots between generations. Aquinas has told me that he's interested in having a student extend this collector (and proof) to handle mutable references.
I think Bas's original question in this topic was on whether there was complete published description for the currently implemented GC in CertiCoq.
I'm reading Andrew's remark as that the OOPSLA paper (https://dl.acm.org/doi/pdf/10.1145/3360597) is the current best approximation.
@Andrew Appel Thanks! My question was mostly informative. Where is the GC documented? The Oopsla paper is a good source.
What is the relation with the ICFP paper (if any)?
@Andrew Appel is the extended version of the OOPSLA paper available somewhere? The paper seems to link to https://www.comp.nus.edu.sg/~hobor/Publications/2019/autoquack_extended_oopsla19.pdf, but that gives 404
Paolo Giarrusso said:
Andrew Appel is the extended version of the OOPSLA paper available somewhere? The paper seems to link to https://www.comp.nus.edu.sg/~hobor/Publications/2019/autoquack_extended_oopsla19.pdf, but that gives 404
The most straightforward way to find out would be to ask Shengyi or Aquinas.
https://www.cs.princeton.edu/~shengyiw/
https://www.comp.nus.edu.sg/~hobor/
Last updated: Feb 06 2023 at 05:03 UTC