For just a day or two, I've seen many OOM failures with Perennial. cc @Tej Chajed Is there some recent change that would explain this?

Perennial has been growing a lot lately, as the big case study they are working on is nearing completion

on my 16GB RAM system I had to go down to `-j3`

to avoid OOM situations

most of the files towards the end of the compilation process need 3-4GB of RAM per `coqc`

so I guess the answer is -- yes, there is such a recent change, namely as they keep working up the stack of what they verify, coq needs more and more RAM

(using third person pronouns here as I am not directly involved with that verification, just helping with some of the infrastructure)

I don't think there was a *single* change that made a big difference though. just a general growth of the case study.

A good candidate for in-depth profiling of the memory use then...

would be interesting to know how memory use correlates with lines-per-file for the project. Could be multiple thousands-line files in there

My educated guess is rather that it's correlated to lines per proof.

The bigger your proof, the more stuff you'll get in there that's not garbage collected (evars, universes, etc.)

when it was added to CI (https://github.com/coq/coq/pull/10770) it was said

Perennial takes about 30 minutes to build, including all the verified examples (a replicated disk, a mail server, as well as some smaller ones).

ah, but those tend to be coupled (lines overall vs. proof script lines)

Yes, there is a correlation, but not always. Some developments have a lot of short lemmas, i.e. four-color

Gaëtan Gilbert said:

when it was added to CI (https://github.com/coq/coq/pull/10770) it was said

Perennial takes about 30 minutes to build, including all the verified examples (a replicated disk, a mail server, as well as some smaller ones).

depending on the machine people use, I think it currently takes between 20 and 40min

so factoring lemma statements/proofs is not only good for reuse, but also for avoiding garbage collector woes (and possibly also for improving machine learning)

Karl Palmskog said:

would be interesting to know how memory use correlates with lines-per-file for the project. Could be multiple thousands-line files in there

the case studies (`program_proof`

folder) sum up to 10k lines of specs and 20k lines of proofs (according to `coqwc`

)

that's a whole lot of case studies together

so, this does not seem completely out of the ordinary

right, but one would need to know distribution over files of those lines

```
spec proof comments
169 460 1 src/program_proof/addr/addr_proof.v
452 1071 23 src/program_proof/buf/buf_proof.v
57 51 0 src/program_proof/buf/defs.v
162 727 5 src/program_proof/buftxn/buftxn_proof.v
48 80 6 src/program_proof/buftxn_replication/buftxn_replication_proof.v
199 343 133 src/program_proof/buftxn/sep_buftxn_proof.v
10 0 0 src/program_proof/examples/all_examples.v
46 204 4 src/program_proof/examples/alloc_addrset.v
289 526 61 src/program_proof/examples/alloc_crash_proof.v
105 165 3 src/program_proof/examples/alloc_proof.v
320 1004 33 src/program_proof/examples/async_inode_proof.v
301 819 46 src/program_proof/examples/dir_proof.v
135 877 55 src/program_proof/examples/indirect_inode_append_proof.v
361 886 17 src/program_proof/examples/indirect_inode_proof.v
268 624 34 src/program_proof/examples/inode_proof.v
23 12 2 src/program_proof/examples/print_assumptions.v
135 308 54 src/program_proof/examples/replicated_block_proof.v
211 465 39 src/program_proof/examples/single_async_inode_proof.v
126 262 20 src/program_proof/examples/single_inode_proof.v
40 94 1 src/program_proof/examples/toy_proof.v
89 100 8 src/program_proof/kvs/specs.v
97 158 10 src/program_proof/lockservice/bank_proof.v
175 297 20 src/program_proof/lockservice/common_proof.v
36 40 2 src/program_proof/lockservice/fmcounter_map.v
86 34 3 src/program_proof/lockservice/kv_durable.v
115 152 5 src/program_proof/lockservice/kv_proof.v
353 0 32 src/program_proof/lockservice/lockservice_crash.v
139 283 31 src/program_proof/lockservice/lockservice_proof.v
313 0 30 src/program_proof/lockservice/lockservice.v
50 102 12 src/program_proof/lockservice/movers.v
5 0 1 src/program_proof/lockservice/nondet.v
147 181 31 src/program_proof/lockservice/rpc.v
62 104 1 src/program_proof/lockservice/scratch.v
316 1272 195 src/program_proof/simple/simple.v
13 53 0 src/program_proof/simple/spec.v
714 468 0 src/program_proof/txn/commit_proof.v
128 78 2 src/program_proof/txn/invariant.v
33 187 2 src/program_proof/txn/load_proof.v
15 51 0 src/program_proof/txn/map_helpers.v
17 2 5 src/program_proof/txn/recovery_proof.v
2 0 0 src/program_proof/txn/txn_proof.v
107 100 6 src/program_proof/wal/abstraction.v
71 285 56 src/program_proof/wal/circ_proof_crash.v
372 812 8 src/program_proof/wal/circ_proof.v
83 279 0 src/program_proof/wal/common_proof.v
56 262 4 src/program_proof/wal/flush_proof.v
338 952 17 src/program_proof/wal/heapspec_list.v
463 1284 12 src/program_proof/wal/heapspec.v
82 122 9 src/program_proof/wal/highest.v
643 1037 23 src/program_proof/wal/installer_proof.v
551 269 67 src/program_proof/wal/invariant.v
147 349 0 src/program_proof/wal/lib.v
88 568 6 src/program_proof/wal/logger_proof.v
8 0 0 src/program_proof/wal/proof.v
97 266 4 src/program_proof/wal/read_proof.v
137 396 191 src/program_proof/wal/recovery_proof.v
376 822 6 src/program_proof/wal/sliding_proof.v
77 116 4 src/program_proof/wal/sliding.v
15 0 0 src/program_proof/wal/specs.v
34 30 2 src/program_proof/wal/thread_owned.v
87 77 3 src/program_proof/wal/transitions.v
213 619 52 src/program_proof/wal/write_proof.v
10407 21185 1397 total
```

the new stuff is mostly in `wal`

,`txn`

, `simple`

, `buf*`

you have at least 5 files or so with 1000+ lines of proof scripts, I would look at memory use for those

Anyway, we absolutely need a lighter target to run in Coq's CI.

@Tej Chajed any proposal for how to best exclude some of these proofs for Coq's CI?

Perennial is now more like 100 minutes to build single-threaded on a fast machine, which is indeed way to much for every Coq CI run. More seriously the memory usage isn't something we track and it doesn't affect me locally so I don't notice.

I think we can fix this pretty easily while still testing the core of Perennial - I'll write a new .v file that depends on a subset of Perennial and we'll switch Coq CI to build that instead of the default target

Couple of comments:

- as discussed in Github, setting
`OCAMLRUNPARAM`

to some GC magic could help with mem comsuption - using simple compiler and compare could also be interesting, actually you can use my branch which does replace coqc with the simple compiler and do a bench for perennial only. But my investigations in other developments showed the real problem was the evar_map in the proofs, but indeed I'm unsure if the stm may have some leak.

opened a PR switching to a lite target: https://github.com/coq/coq/pull/13402

Just looking at memory allocations, the lia cache seems to have an important share of the memory consumption in perennial.

You could try `Unset Lia Cache`

to see what happens.

@Emilio Jesús Gallego Arias haven't I heard you complaining about the Lia cache recently as well?

It has a crazy implementation, that's essentially a database that we fully load in memory when we fetch a key-value assignment.

That is, *one single* key-value assignment can joyfully thrash your memory

Indeed @Pierre-Marie Pédrot , we found some weird stuff a while ago, but didn't investigate more.

I am currently writing a stupid patch that is slightly less stupid than the current key-value store implementation. Lia cache in perennial gobbles up several hundreds of megabytes in one go, it might help quite a bit to not clutter the memory with it.

Last updated: Oct 13 2024 at 01:02 UTC