I've been looking into the memory consumption of a large-ish proof (~10 minutes for the entire file) and it is making me acutely aware of the lack of hash consing. Two items in the memory profile caught my attention and I'd love to hear people's thoughts about these.

1) a large chunk (10% in my example) of the allocations that stick around until `Qed`

come from `CClosure.to_constr`

calling `Constr.mkConstructU`

. This is probably due to the fact that our proofs rely heavily on `lazy`

and `lazy head`

to reduce telescopic terms. Now I am starting to wonder if this couldn't be a good opportunity to introduce a bit more hash consing. In our proofs the number of distinct pairs of constructors and universe instances is comparatively tiny. (We explicitly annotate almost all universes and the number of distinct choices for any given universe in a constructor is small.) This might be different for other proofs but I wouldn't be surprised if there is still a whole lot of potential for sharing in an average proof that uses `lazy`

. AFAICT monomorphic developments would benefit even more since all constructors could be shared.

2) A similarly sized chunk (maybe 12% but it's hard to tell exactly) comes from `CClosure.zip_term`

calling `Array.map`

in the `ZApp`

case. I know that full normalization is bound to change most of the arguments of any term it recurses into but I still wonder if it would be worth using `Array.Smart.map`

as is done in the `App (Ind.. | Construct..)`

cases of `klt`

, for example.

can't use smart map there, it's doing fconstr -> constr but smart map must be 'a -> 'a

also hashconsing is very expensive, I doubt it would be worth it

regarding mkConstructU, do you have that many constructors that the three words for the Construct node matter?

or are you talking about the size of the universe instances?

No, this is just the three words for the pair if memtrace is to be trusted. It's 500M allocations that stick around until the end of the proof. Unfortunately I cannot find exactly what the unit is in memtrace-viewer but it's either just allocations or bytes or words.

500M of constructors???? that looks very wrong

Why?

are you generating huge first-order terms?

normally in proof mode, the standard memory leak is that the evarmap keeps a lot of redundant data that could morally have been GCed

solving this doesn't need hashconsing, but a proper GC for the evar heap

if you have 500M of duplication of 3-word objects this looks extremely suspect to me

(that's about 21 millions of constructor nodes in memory)

We do have some bigger terms due to telescopes and telescope morphisms. Some of these terms grow cubically in the number of existentials in the goal and end up repeating the same telescope constructors over and over.

Since telescopes are such a big problem at Qed time we normalize all telescopic terms before we enter them into the proof term to make the kernel's life a little easier

On top of that there are also many, many small list terms in the final proof term.

To put the 500M number in perspective: the proof performs 725G allocations overall

Most of these can be GCd pretty much immediately so 500M ends up being roughly 10% of the stuff that sticks around even with aggressive GC settings

Some more numbers: 61M allocations from `mkVar`

. ~~That probably speaks to the presence of a whole lot of telescopes.~~

Vars are context variables, not bound rels, so it's a common occurrence in any proof

Oh, of course.

have you tried Optimize Proof in your script?

I can say with some certainty that the proof does not have 61 million distinct variables :)

When should I call it? Just before the end of the proof?

anywhere inside your proof

it will collapse the evarmap and remove the dangling evars

I'm curious about the result on memory consumption

I'll put that in somewhere in the middle and build a new profile.

The results are a little difficult to interpret because including `Optimize Proof`

changed when exactly GCs trigger. In any case, `Optimize Proof`

prints `Evars: 41269 -> 1`

but has no positive impact on allocations. In fact, `mkConstructU`

now produces 570M allocations (instead of 470M before) that stick around until the end of the proof.

you can force the GC to trigger on precise points in the script with Optimize Heap to get additional reproducibility

maybe it would make sense to run hashconsing in optimize proof

or at least provide a variant of this command that does it

@Janno if I'm not mistaken, this means that you're generating huge proofs without hashconsing. Do you know the size of the Qed-proofs on disk?

I can check the .vo file when the next experiment is done

The file contains a few auxiliary definitions in addition to the proof. The `.vo`

file is 243K in total. I am not sure what to look for in votour. I think proof terms are not stored in the place where the bodies of definitions end up, right?

In case it's helpful:

```
0: library, starting at byte 95557 (size 91730w)
1: opaques, starting at byte 246585 (size 1485w)
2: summary, starting at byte 16 (size 55663w)
3: tasks, starting at byte 246548 (size 0w)
4: universes, starting at byte 246511 (size 0w)
```

BTW I inserted a handful of `Optimize Heap`

calls into the proof script to reduce the noise from smaller changes in my experiments. It has done nothing to the memory profile except adding a bunch of GC pauses that add two minutes to the total runtime.

The

`.vo`

file is 243K in total

it's quite weird that you have such a huge memory consumption with a small vo file...

the proof you're running is Qed-d?

It is now but it actually wasn't until the very last experiment from which I got the vo size. Before it was `Admitted`

but now it's closed by an axiom of type `forall A, A`

and `Qed`

'd.

what do you mean by "closed by an axiom"? you're throwing away the proof term generated by the script?

or you're just ignoring some subproof?

Just ignoring the last subproof. (which I think of the rest of the main proof that was admitted)

ok.

given the inputs so far, I can say that you're doing something weird

I don't see how you can get a ~200ko vo file with such a memory consumption except if you were relying heavily on proofs by reflection

in particular, the Qed proof lives in the opaque segment which means that its size is ≤ 1485 words

We do indeed rely on that although our flavor of reflection is a little different because the presence of user terms in our reified terms means we have to reduce carefully with `lazy`

and whitelists or `lazy head`

.

this is *tiny*

if you Print with Set Printing All do you get something huge?

still, all the memory consumed by reflection should go away after an optimize proof

The technique we use for our proof terms does indeed generate tiny, tiny terms. It it wasn't for the occasional forced break where we have to restart the reflective part from scratch we wouldn't even have many telescopes in the term. That and user declared hints which also contain telescopes. Of course sometimes we call `lia`

and now the proof term is immediately too big to print..

For realistic proofs we cannot print the proof term. One reason is `lia`

and other tactics we rely on but also just the sheer size of the problem. We are talking about separation logic proofs of cpp code. The particular function being proven correct here is about 50 lines of code (excluding whitespace) and it relies on the specifications of 35 other functions. Some part of it is admitted so it might be much less than the full 50 lines. Even so, it takes a whole lot of proof steps to get through any amount of realistic code and even with our proof technique there is no way the proof term is going to be nice enough to print, especially with `Set Printing All`

.

(I wish I could just point to the actual code and a representative cpp example but we still have not managed to release the code of our automation.)

Pierre-Marie Pédrot said:

still, all the memory consumed by reflection should go away after an optimize proof

There are various cases where terms can still be alive after a proof by reflection. In particular, toplevel definitions can be kept around in reduced forms (and thus possibly huge). E.g., coinductive values. Or plain values in the case of the VM or native.

Example:

```
Require Import ZArith.
Definition foo := Pos.iter S O 100000000.
Eval vm_compute in (foo - foo)%nat.
Optimize Heap. (* memory is not freed *)
```

Okay, I had to go back and double check the vo file size. Turns out with enough experiments one is bound to go wrong. The file didn't actually finish successfully and I simply didn't notice. The .vo file is actually 5.2MB in size.

The votour index looks a little more realistic

```
0: library, starting at byte 95557 (size 97404w)
1: opaques, starting at byte 253936 (size 1962199w)
2: summary, starting at byte 16 (size 55663w)
3: tasks, starting at byte 253899 (size 0w)
4: universes, starting at byte 253862 (size 0w)
```

Last updated: Oct 13 2024 at 01:02 UTC