Stream: Coq devs & plugin devs

Topic: Benchmark noise model


view this post on Zulip Janno (Mar 16 2022 at 09:40):

Does the Coq project have noise measurements for the current benchmark setup? Maybe it would make sense to fit statistical models to repeated benchmarks of master vs. master to see what the noise distribution looks like. If this data does not exist, would it be possible to maybe run one benchmark of master against master (i.e. one where we expect zero difference) every night for a month or so? That should give us plenty of data and it might help with determining if small differences are actually statistically relevant.

view this post on Zulip Ali Caglayan (Mar 16 2022 at 11:14):

We do occasionally bench master against master. Here is an example: https://github.com/coq/coq/pull/15735#issuecomment-1068122486. AFAIK remembering the most recent case is typically the best noise indication we have. I think that it would be a good project to do some statistical analysis of master on master benches, but I don't personally know how that can be done.

view this post on Zulip Ali Caglayan (Mar 16 2022 at 11:15):

If I were to run the bench again when the workers are busy or free the noise might change significantly. But it seems to be a good rule of thumb to take +-1% on most projects.

view this post on Zulip Ali Caglayan (Mar 16 2022 at 11:17):

The bench log I linked btw has the line by line timings in it (that was the point of the PR). You can see how noisy some lines in the CI can be (scroll up). https://gitlab.com/coq/coq/-/jobs/2202936719

view this post on Zulip Ali Caglayan (Mar 16 2022 at 11:18):

      old      new      diff      pdiff   ln      file
  34.6800   33.7690  -0.9110  -2.63%    10  coq-fourcolor/theories/job323to383.v.html                                           
 53.9250    55.1930  1.2680    2.35%   587  coq-unimath/UniMath/SubstitutionSystems/SignatureExamples.v.html                             

view this post on Zulip Ali Caglayan (Mar 16 2022 at 11:24):

I think it would be entirely possible to run the bench on a weekly basis (maybe a few more times if nobody is using it) to collect some data. How useful that data will actually be is a question I don't know the answer to.

view this post on Zulip Janno (Mar 16 2022 at 12:39):

The per line summary looks super useful!


Last updated: Apr 20 2024 at 12:02 UTC