Stream: Coq devs & plugin devs

Topic: Hints.add_transparency rebuilding database(s) a lot


view this post on Zulip Janno (Nov 12 2023 at 10:52):

I am looking at profiles of Require statements and I've got one file here that spends a full second (out of 3 total seconds in its Require lines) in Hints.add_transparency -> Hints.rebuild_db. It takes a little less than 1ms for a single call to Hint Opaque in typeclass_instances in that file and there are about 1400 terms in the transparency (or, rather, opacity) state of typeclass_instances after importing all the dependencies. Now I wonder if these terms are added in the same grouping that they appear in their files of origin. I assume that would result in, say, a thousand individual calls to add_transparency instead of in a single one. Is it necessary to keep the original grouping (assuming that's indeed what happens)? Or could the rebuilding at least be delayed the same way it already happens for the dnet (I think)? Is there optimization potential here?

view this post on Zulip Janno (Nov 12 2023 at 10:53):

I should add that almost all the actual time here is spent in garbage collection. I assume it is constantly creating transparency states, patterns, and dnets that are immediately thrown away when the next change to the transparency state is processed

view this post on Zulip Pierre-Marie Pédrot (Nov 12 2023 at 19:31):

If you have a small reproducible example I can have a look. I'm not sure I fully understand your remark about grouping, but AFAICT a single Hint Transparent command is mapped to a single set of constants being added.

view this post on Zulip Janno (Nov 13 2023 at 09:39):

It's not a perfect reproducing example because the hint patterns are too simplistic so the difference is smaller than in my real example. But I fear it's already a little to complex as it is so I won't tune it further: https://github.com/Janno/add_transparency_example
The repo includes four files:

COQC b1.v
Finished transaction in 4292627480 instructions (successful)
COQC b2.v
Finished transaction in 2676239780 instructions (successful)

view this post on Zulip Janno (Nov 13 2023 at 09:45):

By grouping I meant that Hint Opaque a. Hint Opaque b. leads to two full rebuilds of the hint database (possibly skipping the dnet since that might be done lazily) whereas Hint Opaque a b. only leads to a single rebuild. This can't be fixed in the file where these commands are issued but it might be fixable in files that import that file by delaying rebuilding of the hint db until all transparency changes have been gathered.

view this post on Zulip Janno (Nov 13 2023 at 10:53):

Actually, if one could make all calls to rebuild_db lazy that would probably be a huge win even in the file that contains the Hint Opaque statements. There is no point rebuilding all patterns until they are actually used, right?

view this post on Zulip Pierre-Marie Pédrot (Nov 13 2023 at 11:00):

FWIW on your example the runtime is dominated by the check that a hint is already present or not (very bad algorithmics at play). The rebuild part accounts for only 1/4 of the profile, but indeed it's non-trivial.

view this post on Zulip Janno (Nov 13 2023 at 14:42):

Yeah, as I mentioned, the patterns are way too simple. The patterns in the actual example contain much bigger terms. The fact that typeclass outputs appear in patterns is definitely working against us here.

view this post on Zulip Janno (Nov 13 2023 at 14:47):

I am a little surprised, though, that I am not seeing more time spent checking for duplicates in my real example. Hints.add_list and Hint.exists barely show up in the profile at 80ms (compared to 1s of hint rebuilding from add_transparency)

view this post on Zulip Pierre-Marie Pédrot (Nov 13 2023 at 15:37):

can you try with https://github.com/coq/coq/pull/18305 ?

view this post on Zulip Pierre-Marie Pédrot (Nov 13 2023 at 15:38):

This is not the cleverest implementation out there but it's the fastest to code. We still iterated over the set of head patterns instead of delaying it fully, but hopefully it shouldn't matter for your use case.

view this post on Zulip Janno (Nov 13 2023 at 16:01):

Thanks! I'll try as soon as possible.

view this post on Zulip Janno (Nov 13 2023 at 16:22):

Hm, I am not sure that I can evaluate this easily. I need to apply it to 8.18.0 but there seem to be quite a few changes to hints.ml that happened since that tag.

view this post on Zulip Pierre-Marie Pédrot (Nov 13 2023 at 17:06):

you can try with https://github.com/ppedrot/coq/tree/fast-hint-check-exists-8.18 which is a backport over 8.18.0

view this post on Zulip Janno (Nov 13 2023 at 17:17):

Awesome! Thanks. Rebuilding everything now :)

view this post on Zulip Janno (Nov 13 2023 at 18:40):

The profiles have shifted quite dramatically. There is a performance gain in the file I was looking at but it's much smaller than expected. It seems that a lot of the time saved in add_transparency (1s) is now spent in Hints.force, specifically in Reductionops.strongrec which went from 0.3s to 1.1.s

view this post on Zulip Janno (Nov 13 2023 at 18:41):

I can share the perf profiles. Do you prefer profiler.firefox.com or speedscope.app?

view this post on Zulip Pierre-Marie Pédrot (Nov 13 2023 at 18:41):

I'm not used to either so your call

view this post on Zulip Janno (Nov 13 2023 at 18:46):

Okay, let's go with speedscope.app since that has the best overview. Load these files here in separate tabs and go to the sandwich view and sort by total time. That's the least confusing way of browsing the profiles, I think
outcome.speedscope.json
outcome_patched.speedscope.json

view this post on Zulip Janno (Nov 13 2023 at 18:47):

(You can click on items in the list and it will show callers and callees)

view this post on Zulip Janno (Nov 13 2023 at 18:48):

These profiles include the entire file so there's elpi stuff in there that's unrelated. I can make new profiles for just the Require part tomorrow

view this post on Zulip Janno (Nov 13 2023 at 18:50):

The important bits are that the overall time went down by about 0.5s (but my machine is definitely noisy), add_transparency disappeared entirely, and Reductionops.strongrec went from 0.3s to 1.1s with all of that 0.8s difference from Hints.force

view this post on Zulip Janno (Nov 13 2023 at 18:52):

Unfortunately speedscope.app doesn't allow merging recursive calls or merging functions into callers entirely so it's a little hard to say how much strongrec was from Hints.force before. I think it might be about 150ms.

view this post on Zulip Pierre-Marie Pédrot (Nov 13 2023 at 19:00):

I suspect this is noise, I don't see how the new code would trigger more reductions...

view this post on Zulip Janno (Nov 13 2023 at 19:04):

Not impossible. I'll cut the file off after the Requires and regenerate the profiles.

view this post on Zulip Janno (Nov 13 2023 at 19:06):

Oh, I should remove the rest of the file. Now there's no more Hints.force :face_palm:

view this post on Zulip Janno (Nov 13 2023 at 19:08):

But now that I've already generated the data: pretty much exactly 1s time saved in the Requires. So the delaying of work definitely seems to work :)

view this post on Zulip Janno (Nov 13 2023 at 19:10):

New profiles but they are pretty much identical to the previous ones. There's definitely almost an extra second of reduction work being done
outcome2.speedscope.json
outcome2_patched.speedscope.json

view this post on Zulip Janno (Nov 13 2023 at 19:26):

I don't see this effect if I just call typeclasses eauto with typeclass_instances once after all the Requires

view this post on Zulip Janno (Nov 13 2023 at 19:34):

Maybe it really is noise. AFAICT almsot all the actual time is (once again) spent in garbage collection. Maybe Reductionops.strongrec just got the short straw..

view this post on Zulip Janno (Nov 13 2023 at 19:40):

Switching to firefox profiler for a better timeline view: https://share.firefox.dev/3ubOeKT. Between 5.6s and 6.2s there's what seems to be a single rebuild of a database which is somehow much slower than all the others and spends all its time in GC. If there isn't a good reason to suspect that your patch accidentally creates much more garbage I'd say there's no point investigating this further.

view this post on Zulip Janno (Nov 13 2023 at 19:52):

memtrace has this to say about garbage:


Last updated: Oct 13 2024 at 01:02 UTC