Stream: Coq devs & plugin devs

Topic: 8.16 regression in ssreflect apply


view this post on Zulip Janno (Oct 19 2022 at 15:10):

I am seeing a pretty significant performance regression in ssreflect's apply: tactic, with some files spending 300% as much time in the tactic as they do in 8.15. The maximum time out of ~3300 calls to apply: in the example I am looking at only goes from 9ms to 11ms (says Ltac Profile so take that with a grain of salt). So it seems the the tactic just got uniformly slower somehow. I am preparing flame graphs and all that but I wanted to ask first if anybody had a hunch about what might have changed. I couldn't find relevant changes to the tactic itself between 8.15 and 8.16.

view this post on Zulip Janno (Oct 19 2022 at 15:22):

Initial perf profile results show that the profile of apply: on 8.16 is dominated by evarsolve functions related to aliases: image.png

view this post on Zulip Janno (Oct 19 2022 at 15:22):

For comparison, on 8.15 it's mostly kernel reduction: image.png

view this post on Zulip Janno (Oct 19 2022 at 15:23):

Is this possibly related to reversible coercions? I don't see benchmark results in https://github.com/coq/coq/pull/15693 but the discussion is huge so maybe they are hidden somewhere

view this post on Zulip Janno (Oct 19 2022 at 15:30):

Oh, there is at least one benchmark in there: https://gitlab.com/coq/coq/-/jobs/2100592270. To me it seems like a pretty clear regression in ssreflect-heavy projects.

view this post on Zulip Janno (Oct 19 2022 at 15:38):

Be that as it may, I am not actually convinced that MR is to blame for the profile I am seeing. I cannot even find the word alias in the diff.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 19 2022 at 15:38):

Would be great if you can have a small test case so we can bisect

view this post on Zulip Janno (Oct 19 2022 at 15:42):

I'll try to extract something. I am looking at a closed-sourced development and it will take a while to remove everything I am not allowed to show.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 19 2022 at 15:44):

Yes, alternatively you can bisect yourself, if you use the dune build of Coq that is usually pretty fast

view this post on Zulip Emilio Jesús Gallego Arias (Oct 19 2022 at 15:44):

tho I'm unsure how deep your problem is into your dev of course

view this post on Zulip Janno (Oct 19 2022 at 15:45):

Since I am already looking at benchmarks of PRs that could be related: https://github.com/coq/coq/pull/15789 is another PR with a clear (albeit smaller) performance regression in ssreflect-heavy projects: https://github.com/coq/coq/pull/15789#issuecomment-1065881904

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 15:48):

Does #16450 change anything? (it's a master PR, but maybe you can try it there)

view this post on Zulip Janno (Oct 19 2022 at 15:49):

I don't think I can bisect this problem without minimizing it first. The 8.16 bump was not particularly easy for our development (for a host of reasons) so I expect exactly 0 points between 8.15 and 8.16 to build smoothly.. Also, even just the dependencies in the project itself take roughly 10 minutes to build.

view this post on Zulip Janno (Oct 19 2022 at 15:49):

The performance results of that PR certainly look promising :D

view this post on Zulip Janno (Oct 19 2022 at 15:50):

But it seems https://github.com/coq/coq/pull/16448 is the reason for the performance improvements?

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 15:51):

For the observable ones, yes. This PR was originally part of the other one.

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 15:51):

You can try to backport it, it's very self-contained.

view this post on Zulip Janno (Oct 19 2022 at 15:52):

I'll try that.

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 15:52):

(I'd still be curious to get small examples stressing the aliasing mechanism.)

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 15:53):

Honestly I'm not even sure why we have such a contraption in evarsolve.

view this post on Zulip Janno (Oct 19 2022 at 16:38):

I backported the two MRs and repeated the experiment. The regression has reduced in magnitude to 1.7x (instead of 3x).

view this post on Zulip Janno (Oct 19 2022 at 16:39):

Evarsolve functions essentially disappeared from the profile so that's great

view this post on Zulip Janno (Oct 19 2022 at 16:39):

There is still something going on with strings that wasn't as strongly represented in the profile in 8.15

view this post on Zulip Janno (Oct 19 2022 at 16:48):

Here's the profile on 8.15: https://share.firefox.dev/3VEAfWW
and here's the profile on 8.16 with the backported MRs: https://share.firefox.dev/3yVQ6qt

view this post on Zulip Janno (Oct 19 2022 at 16:50):

Those are inverted call stacks restricted only to apply: itself (by searching for ssrbwd). The percentages are not comparable between the two because the overall time varies. But the number of samples is comparable (99 samples per second in both cases).

view this post on Zulip Janno (Oct 19 2022 at 16:52):

Actually, now that I click around a bit, it seems aliasing computation is still the culprit. caml_string_compare (second in the profile on patched 8.16) contains 1.7% Evarsolve.compare_alias.

view this post on Zulip Janno (Oct 19 2022 at 16:59):

So, it seems the two MRs have removed a lot of internal overhead of alias computation but there is still much more computation done compared to 8.15.. only that now all the time is spent in functions called by these Evarsolve functions

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 17:00):

yes, it's not solving the macroscopic problem, just the fact that the old code was especially slow

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 17:00):

if you could bisect to find the source of the evarsolve blowup it'd be great

view this post on Zulip Janno (Oct 19 2022 at 17:20):

I am not sure how to bisect this. I think I first need to read the code paths to get an idea of what might be a good reproduction case.

view this post on Zulip Emilio Jesús Gallego Arias (Oct 19 2022 at 17:21):

@Janno you are not sure in terms of build / git commands?

view this post on Zulip Janno (Oct 19 2022 at 17:22):

No, I am not sure about finding a good example. I have over 3000 thousand calls to apply: and every single one only takes a few ms at best. I don't know how to find out which calls are promising to demonstrate the regression.

view this post on Zulip Paolo Giarrusso (Oct 19 2022 at 17:24):

where does "invoke one of those 1000 times" fail?

view this post on Zulip Paolo Giarrusso (Oct 19 2022 at 17:25):

(not saying that's a good idea, but I hope it leads to one?)

view this post on Zulip Janno (Oct 19 2022 at 17:26):

Indeed.. I guess I can start adding idtacs to all our uses of apply: and start running each one a thousand times until I find one that clearly shows the regression

view this post on Zulip Jason Gross (Oct 19 2022 at 18:03):

(deleted, moved to a new topic)

view this post on Zulip Janno (Oct 19 2022 at 20:53):

Okay, that's several hours of my life I will never get back. But I managed to finally reproduce the problem outside of my original test case:

From Coq Require Import ssreflect.

Axiom W : forall (P Q: Prop), Prop.
Axiom W_refl : forall P, W P P.

Axiom W_coerce : forall {P Q} (w : W P Q), forall x : unit, False.
Coercion W_coerce : W >-> Funclass.

Goal True.
  do 10 (let n := fresh in pose proof (n := I)).
  Time do 1000 (try (apply : W_refl; fail)).

The problem only becomes apparent in non-empty contexts. 10 assumptions almost doubles the time the failing apply: calls take on 8.16 compared to 8.15, 20 assumptions triple it, and 100 increase it by a factor of 50.
The coercion is also necessary to trigger the problem.

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 20:53):

Thanks!

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:42):

A quick bisect seems to indicate that the culprit is 828bdda605ba7737364f06f8cc00a318d1ee585b.

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:43):

Author: Pierre Roux <pierre@roux01.fr>
Date:   Mon Mar 14 15:57:00 2022 +0100

    Remove uniform inheritance condition for coercions

    This removes the uniform inheritance condition by using
    inference (mostly unification, but also canonical structures or
    typeclasses) to obtain the parameters of coercions.

view this post on Zulip Janno (Oct 19 2022 at 21:43):

I just arrived at the same conclusion

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:43):

cc @Pierre Roux

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:44):

I remember when we benched this we didn't observe any performance overhead.

view this post on Zulip Janno (Oct 19 2022 at 21:44):

I am not surprised given that the regression was already indicated by the benchmarks.

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:44):

Clearly that means we are benching the wrong things.

view this post on Zulip Janno (Oct 19 2022 at 21:44):

Oh, I think we read the results differently

view this post on Zulip Janno (Oct 19 2022 at 21:44):

https://github.com/coq/coq/pull/15789#issuecomment-1065881904 clearly shows significant slowdowns in projects that use ssreflect

view this post on Zulip Gaëtan Gilbert (Oct 19 2022 at 21:45):

0.70% is well within noise range

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:45):

does it?

view this post on Zulip Janno (Oct 19 2022 at 21:46):

One day I will convince all of you to stop looking at the noise generator that is the timing column

view this post on Zulip Janno (Oct 19 2022 at 21:46):

Until that day I'll spend my evenings after Coq bumps bisecting performance regressions ;)

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:46):

Where should we be looking?

view this post on Zulip Janno (Oct 19 2022 at 21:46):

cpu instructions

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:47):

that also seems well within noise range

view this post on Zulip Janno (Oct 19 2022 at 21:47):

No way

view this post on Zulip Janno (Oct 19 2022 at 21:47):

The variance for cpu instructions is much much smaller than that for timing results

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:47):

How is it measured?

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:47):

yes but changing code can have bigger effects at that level

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:47):

if the OS does other stuff inbetween does that get included?

view this post on Zulip Gaëtan Gilbert (Oct 19 2022 at 21:47):

https://github.com/coq/coq/pull/16609 has a bunch of benches with no changes if you want to calibrate your understanding of bench noise

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:48):

I'd rather look at the user time per command

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:48):

@Janno your example seems to indicate some bad algorithmics, it's superlinear in the size of the context

view this post on Zulip Janno (Oct 19 2022 at 21:49):

@Gaëtan Gilbert great examples! https://github.com/coq/coq/pull/16609#issuecomment-1272981665 this is what cpu instruction noise looks like

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:49):

still, < 1% additional CPU instruction is not very concerning even if this is real

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:50):

even assuming 0% noise this is a very small effect

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:50):

there are many reasons why we could have such a difference

view this post on Zulip Janno (Oct 19 2022 at 21:50):

Well, that totally depends. Is it a new feature that comes with an inherent cost? Great, 1% could be fine. But what if it's a change that ought to be free (maybe because nobody is even using the new feature). Then 1% is very concerning.

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:50):

well, everybody uses coercions, so...

view this post on Zulip Janno (Oct 19 2022 at 21:51):

Sure, in this case it's not clear to me that this change can be made without a performance regression. But I don't think there was any discussion about the impact. (Which makes sense because people thought it was just noise.)

view this post on Zulip Ali Caglayan (Oct 19 2022 at 21:53):

@Pierre-Marie Pédrot can you perf the example and see what is busy?

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:55):

it has the same kind of profile as the fast one actually

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:55):

i.e. it's Evarsolve taking time in both cases

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:55):

most likely this is bad algorithmics, i.e. we're quadratically calling the unification or something

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:56):

I fear exceptions and the control flow they bring, amongst others

view this post on Zulip Janno (Oct 19 2022 at 21:56):

I have to go sleep. Thanks for looking into this!

view this post on Zulip Pierre-Marie Pédrot (Oct 19 2022 at 21:58):

same here, will have a look tomorrow, thanks again all

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 07:22):

I think I have an intuition about what's going wrong.

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 07:23):

Before we would synthesize the arguments of the coercion from static data, but now we're generating a bunch of evars that are solved by unification.

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 07:23):

I think that we could simply reinstate the old behaviour for uniform coercions, since we already know the arguments.

view this post on Zulip Janno (Oct 20 2022 at 07:56):

So the evars are what makes it scale so badly in the size of the context?

view this post on Zulip Jason Gross (Oct 20 2022 at 08:16):

Is evar allocation still ... linear? in the size of the context?

view this post on Zulip Pierre Roux (Oct 20 2022 at 08:33):

Pierre-Marie Pédrot said:

Before we would synthesize the arguments of the coercion from static data, but now we're generating a bunch of evars that are solved by unification.

Or, said otherwise: before we had an ad hoc piece of code working only in specific cases (likely for historical reasons), now we use the generic mechanism used everywhere else in the elaborator.

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 08:47):

Yes, but this doesn't prevent us to be more clever in simpler cases.

view this post on Zulip Pierre Roux (Oct 20 2022 at 08:48):

Sure, if we really need a fast path, let's do it.

view this post on Zulip Enrico Tassi (Oct 20 2022 at 08:54):

Who is working on the issue? (just to avoid duplicate work)

view this post on Zulip Enrico Tassi (Oct 20 2022 at 08:56):

@Pierre-Marie Pédrot did you implement a compact representation for evar instances? Is the time to allocate the evar info which makes the difference?

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:02):

no, the problem is really that we're solving unification problems of the form ?e[Γ] ~ M and this is just expensive

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:03):

the fastest work is the one you don't perform

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:03):

I'm working on the issue but this is trickier than partially reverting the code, because there were other changes atop of that it seems

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:30):

my naive attempt to pass the inferred arguments didn't solve the issue

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:41):

Pierre-Marie Pédrot said:

no, the problem is really that we're solving unification problems of the form ?e[Γ] ~ M and this is just expensive

I think the optimization you want to pull should be a last resort.
Imo Γ |- ?e[Γ] ~ M should not be expensive, especially if ?e is fresh, hence cannot occur in M.

view this post on Zulip Gaëtan Gilbert (Oct 20 2022 at 09:42):

I don't think we have a way to tell that some evar if fresher than an arbitrary term

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:42):

Not in general, but in this case we know it.

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:42):

I'm releasing the lock to try to find a fix, and I can share my attempt if you want to have a look

view this post on Zulip Gaëtan Gilbert (Oct 20 2022 at 09:43):

but we would need some special api to unification to use that knowledge

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:43):

I have other on my place, I was just suggesting a more general fix

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:45):

we're doing clever stuff with evar-term problems though

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:45):

Yes, I imagine it is not trivial, since substitution can duplicate

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:46):

the function solving evar-term problems is literally the one that dominates every profile

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:47):

Anyway, since I don't have the time to try my approach, I guess I should cave ;-)

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:48):

Here I see 2 sources of optimization:

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:48):

at least the former should be easy to exploit, since all the info to check for the fast code path is passed solve_simple....

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:50):

maybe it is already the case, I don't recall. But in that case I don't get the slowdown.

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:50):

what is invert_definition supposed to produce when given an identity evar?

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:50):

a no-op?

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:50):

so this needs to be hardcoded because it's not the case today

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:51):

it computes context data in O(n) and more

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:51):

wait, maybe it still needs to do named -> debruijn in Coq

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:51):

(disclaimer: unification is a black box to me)

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:52):

I never recall if the solution in the evar map are rel or var based

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:52):

wdym?

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:52):

evar instances are var based if this is what you mean

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:53):

ok, then the optimization should be doable

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:54):

I think we can make it way faster easily as of today though

view this post on Zulip Enrico Tassi (Oct 20 2022 at 09:54):

if the local subst is the identity, then it is a FO problem, you only need to do OC and if it is not found you assign the evar map

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 09:54):

there is a fast-path check but it occurs after some costly precomputation was performed

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 10:01):

putting the fast path in the right place doesn't seem to help on this example though

view this post on Zulip Pierre-Marie Pédrot (Oct 20 2022 at 10:01):

maybe we're not unifying identity evars

view this post on Zulip Janno (Oct 21 2022 at 07:31):

It seems this avenue has been very fruitful! https://github.com/coq/coq/pull/16703#issuecomment-1286288827
How does this change relate to the problem of creating unnecessary evars?

view this post on Zulip Enrico Tassi (Oct 21 2022 at 07:41):

It optimizes a common case where HO unif boils down to FO unif (and there is still room for improvement).

view this post on Zulip Enrico Tassi (Oct 21 2022 at 07:44):

So it can add up to what was discussed here, ie restore the old code path in some cases. As I said before I'd prefer making trivial evars costless (in all projects) rather than making coercion code more complex. But we will see.

view this post on Zulip Enrico Tassi (Oct 21 2022 at 07:45):

@Pierre-Marie Pédrot how does the pr affect @Janno s micro bench?

view this post on Zulip Janno (Oct 21 2022 at 07:45):

putting the fast path in the right place doesn't seem to help on this example though

This seems to indicate that it doesn't

view this post on Zulip Enrico Tassi (Oct 21 2022 at 07:45):

oops, missed that.

view this post on Zulip Janno (Oct 21 2022 at 07:47):

BTW there is also a performance regression in https://github.com/coq/coq/pull/15693 that affected unification-heavy projects. I have no idea what it is and how to address it but I thought I'd mention it again. Maybe it will lead to another discovery of a late fast path :)

view this post on Zulip Enrico Tassi (Oct 21 2022 at 07:54):

sorry about that, our bench system did not discover it. are the projects public?

view this post on Zulip Janno (Oct 21 2022 at 07:55):

Oh, sorry, I didn't mean I had found this to impact our code. I just mean that the bench results show a regression.

view this post on Zulip Janno (Oct 21 2022 at 07:57):

It could be that it impacts our code but if it did the impact might be masked by other improvements. Removing a bunch of (fortunately) unused coercions to Funcclass has basically entirely undone the regressions that I could identify. At least the ones in proofs. I still have a 10% performance regression in parsing somewhere but I haven't gotten around to minimizing that.

view this post on Zulip Pierre Roux (Oct 21 2022 at 07:58):

Janno said:

I just mean that the bench results show a regression.

What do you mean precisely? benchs were launched before merging the PR and no regression appeared.

view this post on Zulip Enrico Tassi (Oct 21 2022 at 07:59):

It is a new feature, so it does run more code. The problem is to quantify it. I know you expressed concerns about "the wallclock" measure, I can add we have no unit-benchmark, so we are very macroscopic in our observations.

view this post on Zulip Janno (Oct 21 2022 at 08:09):

Yeah, I am definitely going to sound like a broken record but https://gitlab.com/coq/coq/-/jobs/2100592270, even with the most generous reading, shows at least a very clear potential of a slowdown in all of mathcomp. If we look at the usual noise range for math comp wall clock time I think it is even justified to say that there is a regression, albeit it only a small one in wall clock time. It's not entirely unusual to see changes that increase instruction count but don't increase wall clock time as much but I almost never see increases like that that do not also lead at least to some actual slowdown. I don't have as much experience with changes to Coq internals so the picture might be a bit different here. In my work (even in Coq plugins) I basically see a 1-1 correspondence between the two measurements. The only discontinuity I have seen in the last years was when we switched from OCaml 4.07 to OCaml 4.14 where the new GC strategy really changed the picture drastically. But after that the two measurements were once again aligned pretty much perfectly.

view this post on Zulip Janno (Oct 21 2022 at 08:14):

A different perspective, although one that certainly needs more data to support it, is that it took me only a half an hour to identify two candidates for the regression I was seeing just by looking at the perf profile, finding MRs that sounded vaguely related to the functions that showed up in the profile, and then filtering for MRs that had regressions in cpu instruction counts in their benchmark results.

view this post on Zulip Pierre Roux (Oct 21 2022 at 08:16):

So let's be precise, what precisely do you think should have worried us in those results?

view this post on Zulip Janno (Oct 21 2022 at 08:19):

These are the results that would worry me: image.png

view this post on Zulip Pierre Roux (Oct 21 2022 at 08:22):

Sure, I mean which figure(s) and why?

view this post on Zulip Janno (Oct 21 2022 at 08:23):

I tried to highlight the relevant entries in the cpu instruction column.

view this post on Zulip Pierre Roux (Oct 21 2022 at 08:23):

Sorry, did not see them at first.

view this post on Zulip Janno (Oct 21 2022 at 08:24):

I probably should have included coq-corn as well

view this post on Zulip Janno (Oct 21 2022 at 08:24):

It's easy to miss entries with that many numbers that all look very similar

view this post on Zulip Pierre Roux (Oct 21 2022 at 08:28):

Note that some of the figures you highlighted on instructions are negative in cycles. Seems pretty hard to distinguish from noise.

view this post on Zulip Janno (Oct 21 2022 at 08:29):

Cycles are just wall clock time in disguise and they are equally noisy

view this post on Zulip Janno (Oct 21 2022 at 08:30):

Have a look at the (accidental) noise measurements in https://github.com/coq/coq/pull/16609. All the benchmarks that are X against itself produced very good data for estimating noise.

view this post on Zulip Janno (Oct 21 2022 at 08:30):

coq-iris-examples, for example, easily varies by about +/- 1% in wall clock time. So -0.7% in wall clock time but +0.3% in instruction count is a perfectly valid measurement for an 0.3% overall performance regression. It doesn't have to be, but it is possible.

view this post on Zulip Enrico Tassi (Oct 21 2022 at 08:36):

Sure, but again, it is a new feature we can't expect it to have zero cost. Of course we need to measure the benefits v.s. the extra cost. This is hard since the extra cost has a very rough measure. To me a perf regression is "this used to work, now it does not". The one you reported is real, no doubt. But we can't possibly extrapolate that 0.3% means "someone will be stuck".

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 08:36):

I'm still unconvinced by the fact that global CPU instructions, even with close to 0% noise, is a good predictor of potential regressions. It is often the case for algorithmic PRs that they add a small constant overhead while drastically enhancing the complexity class of some subcomponents.

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 08:37):

(I concur with @Enrico Tassi above, basically.)

view this post on Zulip Janno (Oct 21 2022 at 08:39):

I think we are mostly in agreement about all of that. New features cost instructions and that cost, outside of catastrophic complexity problems, is probably not going to block anyone. My worry is solely that regression are missed because they are are misdiagnosed as noise.

view this post on Zulip Pierre Roux (Oct 21 2022 at 08:42):

Thanks for the explanation, I think I get your point (noise on CPU instr. indeed seems well below 0.5 %).
To supplement Enrico's comment, note that indeed there is currently a small regression on mathcomp but other experiments seems to indicate that once we'll be able to use the feature (i.e. when mathcomp will require Coq >= 8.16), we could expect a small speedup.

view this post on Zulip Janno (Oct 21 2022 at 08:43):

I am 100% not advocating for blocking MRs because the introduce a regression in cpu instructions. I just want those to be noticed. Maybe the changelog could even say "Feature X has been added. Note that we except a very small performance regression in projects that use ssreflect because of this new feature".

view this post on Zulip Janno (Oct 21 2022 at 08:44):

(I know it's not ssreflect, it's evarconv, so that message wouldn't be correct.)

view this post on Zulip Janno (Oct 21 2022 at 08:49):

I realize that my perspective on this is a bit unique. I don't think other people track the performance of their proofs as closely. But I have found this to be the only way to stay (somewhat) sane when dealing with a code base in which the proofs change, the automation changes, the coq versions change, and even OCaml versions change every once in a while. It is every easy to lose track of performance regressions if I don't benchmark defensively. Finding out about regressions after the fact can lead to almost impossible debugging problems. Coq itself suffers from this on a much smaller scale with the ongoing changes to build system infrastructure. I ended up bisecting the regression somewhat manually (i.e. not with my usual git bisect run approach) because of the dune issues that were also discussed here.

view this post on Zulip Janno (Oct 21 2022 at 08:49):

So knowing what might introduce regressions is at least half the battle

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 08:59):

regardless of the discussion about noise measurement, there are clear missing optimizations in Evarsolve with the new evar representation

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 08:59):

I believe (wrongly?) that solving evar-evar problems with trivial instances should always be immediate

view this post on Zulip Janno (Oct 21 2022 at 09:02):

I've had the same thought around trivial instantiations a long time ago when I noticed that Mtac2 was suffering immensely from trivial unification problems because it uses evars for pattern variables and unification for matching patterns against terms. Knowing nothing about unification, I opted to work around the problem by coming up with non-unification based, yet still sound patterm matching alternatives. But the complexity of those is only barely worth the performance benefit so I would very much welcome faster evar instantiation.

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:04):

I'm looking at the code right now, trying to grasp the behaviour of the algorithm on trivial instances

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:07):

due to aliasing (ahem) we may be doing non-trivial things on trivial instances it seems

view this post on Zulip Enrico Tassi (Oct 21 2022 at 09:09):

Pierre-Marie Pédrot said:

I believe (wrongly?) that solving evar-evar problems with trivial instances should always be immediate

yes (rightly)

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:10):

well then it's not

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:10):

we're leisurely doing a little tour of random aliasing computation and stuff in probably worse than O(n)

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:11):

also stare at candidates

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:11):

Wanted: a specification

view this post on Zulip Enrico Tassi (Oct 21 2022 at 09:11):

I don't think it is necessary, since once you unify ?1 with ?2 (trivially) then you do ?2 = t and there you do all the crazy aliasing stuff

view this post on Zulip Enrico Tassi (Oct 21 2022 at 09:12):

?1[x,z] = ?2[x,y] (for y := z) is not a trivial evar-evar problem

view this post on Zulip Enrico Tassi (Oct 21 2022 at 09:13):

I mean, I would not consider it so, hence the aliasing business is not needed in the fast code path

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:14):

that's not obvious just reading the code

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:14):

I'd be happy to have this code written in Coq just to evaluate it partially

view this post on Zulip Enrico Tassi (Oct 21 2022 at 09:16):

coq_of_ocaml ;-)

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:17):

that would be to easy, this stuff is using mutability internally

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:36):

I confirm that hacking the evar-evar case to return immediately on trivial instances solves the issue at hand

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:37):

my code is probably slightly unsound though, because we could have the same default instance coding for two expanded instances when filters are involved

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:48):

https://github.com/coq/coq/pull/16713

view this post on Zulip Pierre-Marie Pédrot (Oct 21 2022 at 09:49):

I think this variant is sound


Last updated: Mar 28 2024 at 16:02 UTC