Stream: coqbot devs & users

Topic: coqbot down?


view this post on Zulip Jason Gross (Mar 19 2021 at 20:12):

coqbot doesn't seem to be reponding to my messages (https://github.com/coq-community/run-coq-bug-minimizer/issues/5#issuecomment-803089971), also the curl at https://github.com/coq-community/run-coq-bug-minimizer/runs/2148646589?check_suite_focus=true#step:5:16513 . (I hope I didn't accidentally OOM it; I may have somewhat accidentally sent it a request to post a couple hundred MB-large comment... (Sorry, I wasn't thinking too carefully before asking it to run minimization on very very very verbose mode.))

view this post on Zulip Jason Gross (Mar 19 2021 at 20:21):

(Where by "a couple hundred MB" I actually apparently mean 508 MB, and I got a response from coqbot with an iframe pointing to www.herokucdn.com/error-pages/application-error.html)

view this post on Zulip Jason Gross (Mar 19 2021 at 20:25):

(Apparently my log file was ~14 million lines >.> )

view this post on Zulip Théo Zimmermann (Mar 19 2021 at 21:31):

Yes, earlier today I got some automatic reports about coqbot's memory quota being vastly exceeded, which made the bot crash.

view this post on Zulip Théo Zimmermann (Mar 19 2021 at 21:32):

I looked at the log at the time and couldn't figure out what was going on that crashed it.

view this post on Zulip Théo Zimmermann (Mar 19 2021 at 21:34):

I don't understand how your comment could be as big as you said. Could you explain it further? Do you have a suggestion on how to fix coqbot to be robust to this kind of situations?

view this post on Zulip Théo Zimmermann (Mar 19 2021 at 21:34):

Théo Zimmermann said:

I looked at the log at the time and couldn't figure out what was going on that crashed it.

But after auto-restarting it seemed to work fine.

view this post on Zulip Théo Zimmermann (Mar 19 2021 at 21:36):

A last remark is that today most installations of coqbot are using the GitHub App (see https://github.com/coq/bot/#how-to-use-the-coqbot-instance). This gives me the possibility to see the list of GitHub webhooks that were sent and to retry any that failed when there was an issue. That being said, I don't have the same thing for the GitLab webhooks.

view this post on Zulip Jason Gross (Mar 20 2021 at 01:18):

Sorry about that. The issue is that I have the bug minimizer send the minimization log as part of the comment (see, for example, https://github.com/coq-community/run-coq-bug-minimizer/issues/5#issuecomment-802230668 ). This is so that it's easier to debug the issue. I somewhat unthinkingly asked the minimizer to run with at it's most verbose seetting, to try to figure out why it was minimizing incorrectly. The most verbose setting, among other things, will print out the entire file before every run of Coq, and print out the glob files every time it reads it. Even for files that are only a couple hundred lines long, after running for 5 hours straight, running Coq every second or so, that's 1.8 million lines in the log. I guess there's another factor of 10 hiding somewhere. So at the end of the run, the minimizer sends a post request to coqbot with this very long comment, 508MB big, according to curl stats.

view this post on Zulip Jason Gross (Mar 20 2021 at 01:20):

I expect that coqbot is not robust against getting 500 MB post requests (either because it duplicates the request in many places or has inefficient regex handling or whatever). I would suggest picking some upper limit (perhaps 20 MB?), and truncating post requests longer than that, adding something like "POST request size limit exceeded" to the end or whatever.

view this post on Zulip Jason Gross (Mar 20 2021 at 01:26):

Also, hmm, I was wrong, it's just that coqbot is failing to respond to some of my comments. I've reported this at https://github.com/coq/bot/issues/132

view this post on Zulip Théo Zimmermann (Mar 20 2021 at 11:41):

There might be two different issues. Because coqbot did git hit a memory issue yesterday.

view this post on Zulip Jason Gross (Mar 20 2021 at 13:27):

Yes, I think there are two different issues. Maybe even three, because it seems to have also ignored this curl request that was 330kb, and was pretty similar to other curl requests it was happy to respond to: https://github.com/coq-community/run-coq-bug-minimizer/runs/2153333631?check_suite_focus=true#step:5:16938

view this post on Zulip Emilio Jesús Gallego Arias (Mar 20 2021 at 16:52):

Limitation of input size does make sense tho, as this is an easy DOS, I would use way lower limits than 20MB tho, somethink like 50k

view this post on Zulip Jason Gross (Mar 20 2021 at 23:52):

I think the bug minimizer routinely makes requests that are a couple hundred kilobytes, since it includes minimization logs (and it really is useful to not have to go digging for the logs elsewhere when things go wrong, though I would be okay dropping the logs)

view this post on Zulip Théo Zimmermann (Mar 21 2021 at 08:23):

If the logs can be downloaded from somewhere, coqbot could also do that, trim them and post them. This is what happens for GitLab CI logs.

view this post on Zulip Théo Zimmermann (Mar 21 2021 at 08:25):

Though what would be even better would be for the bot minimizer itself to generate logs at different levels of verbosity in different files, so that coqbot can retrieve the shortest one to display directly and link to the other ones.

view this post on Zulip Emilio Jesús Gallego Arias (Mar 21 2021 at 17:55):

Indeed we could store the logs elsewhere, IMHO it makes sense for a protocol such as the one in coqbot not to carry "heavy" data

view this post on Zulip Jason Gross (Apr 07 2021 at 16:39):

I've currently hardcoded a limit of about 3MiB to the comments that the bug minimizer sends to coqbot. I've run into the following issue, though, with having coqbot decide what log to post: GitHub actions artifacts links are only available after the run has completed, whereas the request to coqbot is generated during the run. So there is no way, it seems, to retrieve a link to the relevant artifact file. (We could move the runner from GH Actions to GitLab, perhaps, and use artifacts there, which perhaps have more stable links?)

view this post on Zulip Théo Zimmermann (Apr 07 2021 at 20:08):

We can wait with an exponential back off strategy. We already do this for some other things.

view this post on Zulip Jason Gross (Apr 21 2021 at 00:20):

I've just discovered that github comments have a maximum character limit of 65536. I've updated the run-coq-bug-minimizer script to only give coqbot comments that stay within this length (hopefully, my math is a bit approximate).

view this post on Zulip Jason Gross (May 23 2021 at 13:33):

We can wait with an exponential back off strategy. We already do this for some other things.

Belatedly, this is not going to work. Currently the minimizer uploads, in addition to log artifacts, and artifact of the whole build directory (which I plan to use soon to track down the OCaml native-compiler ci-elpi issue). This artifact is about 2 GB and takes about 3 hours to upload, and I don't think we want to rely on coqbot doing a 3-hour-long exponential-backoff...


Last updated: Jan 31 2023 at 10:01 UTC