@Emilio Jesús Gallego Arias I don't see anything suspicious in Byte.v with memtrace?
In particular, find_or.loop only allocates a bit more than 20M according to the tool.
My results gave me 2GiB
I dunno
very suspiciou, you are in 4.11.1 right
?
Yeah I am getting very weird results
4.11.1+fp
I don't think that the +fp changes anything
I'll upload a file
allocations in List.v look innocuous as well
can you upload the file for Strings/Byte.v
?
the monad does allocate quite a bit but these allocations never reach the major heap
let's see
Umm, what file are you compiling?
is that Strings/Byte.v?
Ah, sorry, I was compiling Init.Byte -_-
misread your post :p
Let's try again
For example why is or_find allocating
it should not allocate at all
now it does look better (or not depending on your point of view)
well it's a closure that captures variables
so if it's not inlined, you have to allocate those variables
in theory these shouldn't matter performancewise because they're very very short-lived
Major heap only accounts for 320M in my trace for instance
for a grand total of 16.6G
Yeah I'm looking at the major heap now
and going top down
will switch to the simple compiler tho as to get a better understanding
I confirm that repr_if_not_found never reaches major
tho the pattern of allocation for quite simple proofs like in List.v seems a bit worrying
yeah it doesn't
I think that looking at fiat-crypto or VST is a good way to really start worrying
Indeed let's see what happens there; but I'm a bit shocked that for example byte takes 17GiB of allocation
tho most of it in the minor heap so likely harmless
OCaml programs allocate a lot
Coq is not specific in that regard
The use of a two-generation GC is critical for perfs
It's quite well-documented, I think that the "unraveling OCaml" book mentions this fact
Sure, I mean more than it seems to me, that our proof engine is inherently leaky, in the sense that allocation grows with the number of tactics
unless you call optimize proof I guess
Heaps are inherently leaky :shrug:
?
Proof search is not
If you have a global state and no GC, you're essentially leaky
imagine prolog was implemeted this way
I am afraid I do not believe in Prolog
not sure what that means
Anyways, I'd be much more worried about the module system, which induces memory consumption by proxy
(see e.g. my last PR https://github.com/coq/coq/pull/13156)
can flambda remove allocations? I am missing context, but operations in the minor heap are _not_ free — they pollute the cache
The minor heap has a good cache locality though
But indeed flambda does remove allocations
The find_or above is probably a good example
if you inline the loop, you become allocation-free
which is a typical flambda optimization
sounds better
I mean, doing map (+1) (map (+1) xs)
without fusion for large xs means writing 2 copies of xs instead of 1
But relying on flambda is fragile, I believe
for tight loops we should prohibit closure creation by hand
(I am a bit surprised that in the vanilla compiler the aforementioned function is not already inlined)
you’re the expert, but is there any optimization in OCaml without flambda?
as usual relying on optimizations has tradeoffs, and I don’t know how good flambda is
Not much, but loop inlining is one of them when the compiler realizes that you wanted to write a pure FP while
I am not too fond of flambda
it does give a lot of optimizations, but it's not quite robust
OTOH, some manual optimizations can require destroying modularity. So pick your poison.
one advantage of the vanilla compiler is that it is quite predictable
yeah I know :/
Anyway: allocations that die immediately are only free at GC time (because copying GC only copies live objects), not for caches (worst-case, you must throw away useful data from caches, load memory, zero it, then initialize the objects, and then stop using them and throw away other useful data and restart)
Probably it’s not as bad, and maybe the young generation fits in L2, but the cache would help for other things as well :-P
Last updated: Sep 09 2024 at 05:02 UTC