As of writing, you should be able to do @coqbot bench
, please try it out and let me know how you found it.
It just starts the bench for the moment. On my todo list are:
When you start a bench, the job should appear in the checks tab at the bottom. The link there will take you straight to the bench tab.
As of yesterday you should be able to do:
@coqbot bench
to start a bench.Hopefully soon we can get coqbot to post a comment too, but I need to work out a few more things before this is possible.
Here is what this looks like at the moment: https://github.com/coq/coq/pull/14748/checks?check_run_id=5659421664
Is it normal that github displays the bench as waiting as long as it has not been launched?
it's annoying for most PRs since we don't want to run a bench in general
(this also mean we can't have a green CI anymore without running the bench)
No that shouldn't happen, I will look into it
@Pierre-Marie Pédrot Have you got an example that you saw?
I see that GitHub displays the bench as waiting, but it doesn't change the overall tick mark for me
https://github.com/coq/coq/pull/15849
it's yellow in the list of checks so it can't be green
(the green tick is about rebasability, not the CI)
I can completely remove it from the checks tab or I can give it a neutral status
I think it would still be useful to keep in the checks tab
neutral is fine I think
(as long as it doesn't appear as waiting somehow)
Alright, I've changed it to be neutral, that should appear in around 5 min
As I've commented on one of your PRs (but you seem to have missed), you could also try using the action_required
status (that we've never used so far, so I don't know how it looks).
The main issue that I can see with neutral is that so far it has been used only for failures in allow failure mode, and if we start putting it on every PRs, people may stop noticing such failures.
Hmm action_required was way more aggressive than I thought.
It straight up fails the check
https://github.com/coq/coq/pull/15657/checks?check_run_id=5661027689
I believe neutral is better in this case
can't we have no check when the bench isn't started
We can, but is it better to have less info?
yes
in this case "bench isn't started" isn't valuable enough to use a check field imo
Clearly, for experienced users that know about the command, it's better not to show anything.
To make the feature discoverable, we should make sure it is documented anywhere the bench is mentioned.
can we not make coqbot into some spammy bot? minimization spam is already pretty bad
And newcomers also learn by imitation (by seeing other people use the command).
@coqbot bench
should now post a summary of the bench in the PR like this: https://github.com/coq/coq/pull/14748#issuecomment-1081062517
Let me know if you have any suggestions about how the output should be formatted.
Since the bench summary is also displayed in the Checks tab, I would instead suggest to not use any folding in the comment: to display the main table there since it's what people usually look at first, and to put a link from the comment to the check run summary. It would also provide a trace that would mean never losing the links to these check runs. I can give advice on how to achieve this.
For the top speed ups/ slow downs I chose 25 for both. It can be tweaked here:
https://github.com/coq/coq/blob/1f6328bba6596e2cfd35cee6a786feb795cc1adf/dev/bench/render_line_results.ml#L100
Is everybody happy with this? Anybody want more or less?
Sounds good (although considring its folded by default in the result, it could maybe be a bit more without being too annoying)
@Ali Caglayan I think that we should consider speed-ups in terms of relative time diff, not absolute diff.
@Pierre-Marie Pédrot How do we calculate that?
it's the %DIFF column
Right, but that can have a lot of garbage when I checked it
I can submit a PR that prints both to demonstrate if you like
Anybody have a commit with some real speed ups or slow downs however?
The main issue is that noise is not relative, so an extra 0.01 second can become a lot of pdiff for one line but not so much for another. Hence why I settled on absolute diff.
I agree that for small durations it's just noise, but maybe we should consider relative diff above some threshold
but maybe it's not that useful anyways
OK here is a PR: https://github.com/coq/coq/pull/15889
When the bench finishes it will display pdiff tables in the log and also generate pdiff artifacts.
I've manually triggered it again just to bench bignums
@Pierre-Marie Pédrot Here are the pdiff tables: https://gitlab.com/coq/coq/-/jobs/2280435968
I suppose I can try to filter by a minimum absolute difference.
What would be a good value? 0.05?
maybe a bit higher, like 0.1
Alright, I've filtered those out lets see how we do: https://gitlab.com/coq/coq/-/jobs/2281363213
I've also made the artifacts in _build/timings .txt files so we should be able to view them without downloading
(hopefully)
Of course now coqbot doesn't know that they are txt files so it will not have a good time reporting this bench
Let's try again, I screwed up which values I was filtering https://gitlab.com/coq/coq/-/jobs/2281425991
what's this?
Error while fetching checks for #15890 for running bench job: Got more than one checkSuite.
https://github.com/coq/coq/pull/15890
it can take a bit of time to push to gitlab / get the pipeline started, this error happens if you talk to the bot before that's done
We should handle this gracefully, for instance by retrying with a delay.
Last updated: Dec 07 2023 at 06:38 UTC