I am looking at profiles of Require
statements and I've got one file here that spends a full second (out of 3 total seconds in its Require
lines) in Hints.add_transparency
-> Hints.rebuild_db
. It takes a little less than 1ms for a single call to Hint Opaque
in typeclass_instances
in that file and there are about 1400 terms in the transparency (or, rather, opacity) state of typeclass_instances
after importing all the dependencies. Now I wonder if these terms are added in the same grouping that they appear in their files of origin. I assume that would result in, say, a thousand individual calls to add_transparency
instead of in a single one. Is it necessary to keep the original grouping (assuming that's indeed what happens)? Or could the rebuilding at least be delayed the same way it already happens for the dnet (I think)? Is there optimization potential here?
I should add that almost all the actual time here is spent in garbage collection. I assume it is constantly creating transparency states, patterns, and dnets that are immediately thrown away when the next change to the transparency state is processed
If you have a small reproducible example I can have a look. I'm not sure I fully understand your remark about grouping, but AFAICT a single Hint Transparent command is mapped to a single set of constants being added.
It's not a perfect reproducing example because the hint patterns are too simplistic so the difference is smaller than in my real example. But I fear it's already a little to complex as it is so I won't tune it further: https://github.com/Janno/add_transparency_example
The repo includes four files:
Hint Resolve
(roughly matching my real example in count) and 1400 individual Hint Opaque
. Hint Resolve
but only a single Hint Opaque
containing 1400 items.COQC b1.v
Finished transaction in 4292627480 instructions (successful)
COQC b2.v
Finished transaction in 2676239780 instructions (successful)
By grouping I meant that Hint Opaque a. Hint Opaque b.
leads to two full rebuilds of the hint database (possibly skipping the dnet since that might be done lazily) whereas Hint Opaque a b.
only leads to a single rebuild. This can't be fixed in the file where these commands are issued but it might be fixable in files that import that file by delaying rebuilding of the hint db until all transparency changes have been gathered.
Actually, if one could make all calls to rebuild_db
lazy that would probably be a huge win even in the file that contains the Hint Opaque
statements. There is no point rebuilding all patterns until they are actually used, right?
FWIW on your example the runtime is dominated by the check that a hint is already present or not (very bad algorithmics at play). The rebuild part accounts for only 1/4 of the profile, but indeed it's non-trivial.
Yeah, as I mentioned, the patterns are way too simple. The patterns in the actual example contain much bigger terms. The fact that typeclass outputs appear in patterns is definitely working against us here.
I am a little surprised, though, that I am not seeing more time spent checking for duplicates in my real example. Hints.add_list
and Hint.exists
barely show up in the profile at 80ms (compared to 1s of hint rebuilding from add_transparency
)
can you try with https://github.com/coq/coq/pull/18305 ?
This is not the cleverest implementation out there but it's the fastest to code. We still iterated over the set of head patterns instead of delaying it fully, but hopefully it shouldn't matter for your use case.
Thanks! I'll try as soon as possible.
Hm, I am not sure that I can evaluate this easily. I need to apply it to 8.18.0 but there seem to be quite a few changes to hints.ml
that happened since that tag.
you can try with https://github.com/ppedrot/coq/tree/fast-hint-check-exists-8.18 which is a backport over 8.18.0
Awesome! Thanks. Rebuilding everything now :)
The profiles have shifted quite dramatically. There is a performance gain in the file I was looking at but it's much smaller than expected. It seems that a lot of the time saved in add_transparency
(1s) is now spent in Hints.force
, specifically in Reductionops.strongrec
which went from 0.3s to 1.1.s
I can share the perf profiles. Do you prefer profiler.firefox.com or speedscope.app?
I'm not used to either so your call
Okay, let's go with speedscope.app since that has the best overview. Load these files here in separate tabs and go to the sandwich view and sort by total time. That's the least confusing way of browsing the profiles, I think
outcome.speedscope.json
outcome_patched.speedscope.json
(You can click on items in the list and it will show callers and callees)
These profiles include the entire file so there's elpi stuff in there that's unrelated. I can make new profiles for just the Require
part tomorrow
The important bits are that the overall time went down by about 0.5s (but my machine is definitely noisy), add_transparency
disappeared entirely, and Reductionops.strongrec
went from 0.3s to 1.1s with all of that 0.8s difference from Hints.force
Unfortunately speedscope.app doesn't allow merging recursive calls or merging functions into callers entirely so it's a little hard to say how much strongrec
was from Hints.force
before. I think it might be about 150ms.
I suspect this is noise, I don't see how the new code would trigger more reductions...
Not impossible. I'll cut the file off after the Require
s and regenerate the profiles.
Oh, I should remove the rest of the file. Now there's no more Hints.force
:face_palm:
But now that I've already generated the data: pretty much exactly 1s time saved in the Require
s. So the delaying of work definitely seems to work :)
New profiles but they are pretty much identical to the previous ones. There's definitely almost an extra second of reduction work being done
outcome2.speedscope.json
outcome2_patched.speedscope.json
I don't see this effect if I just call typeclasses eauto with typeclass_instances
once after all the Require
s
Maybe it really is noise. AFAICT almsot all the actual time is (once again) spent in garbage collection. Maybe Reductionops.strongrec
just got the short straw..
Switching to firefox profiler for a better timeline view: https://share.firefox.dev/3ubOeKT. Between 5.6s and 6.2s there's what seems to be a single rebuild of a database which is somehow much slower than all the others and spends all its time in GC. If there isn't a good reason to suspect that your patch accidentally creates much more garbage I'd say there's no point investigating this further.
memtrace has this to say about garbage:
Last updated: Oct 13 2024 at 01:02 UTC