Page MenuHomePhabricator

Lots of "EventBus: Unable to deliver all events: 504: Gateway Timeout"
Closed, DuplicatePublic

Description

2020-03-26 16:09:09 [XnzTmwpAAEQAAJUGnSwAAADI] mw1273 cswiki 1.35.0-wmf.24 exception ERROR: [XnzTmwpAAEQAAJUGnSwAAADI] /w/index.php?title=Wikipedistka:Zorka_Sojka/D%C3%ADlna/Jan_Kotek&action=delete   JobQueueError from line 106 of /srv/mediawiki/php-1.35.0-wmf.24/extensions/EventBus/includes/JobQueueEventBus.php: Could not enqueue jobs: Unable to deliver all events: 504: Gateway Timeout {"exception_id":"XnzTmwpAAEQAAJUGnSwAAADI","exception_url":"/w/index.php?title=Wikipedistka:Zorka_Sojka/D%C3%ADlna/Jan_Kotek&action=delete","caught_by":"mwe_handler"}

This happened when trying to delete a page at Czech Wikipedia. It first hanged out for ~10 seconds, and then threw this fatal error. Refreshing fixed it.

Event Timeline

So what happened there is that eventgate-main was struggling to respond, and we have a rather aggressive timeout now (at 10 seconds). I'll relax that and add some more retry logic.

Joe triaged this task as Medium priority.Mar 26 2020, 4:59 PM

Change 583688 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] services_proxy: higher timeout for eventgate-main, more retries

https://gerrit.wikimedia.org/r/583688

Change 583688 merged by Giuseppe Lavagetto:
[operations/puppet@production] services_proxy: higher timeout for eventgate-main, more retries

https://gerrit.wikimedia.org/r/583688

I am no longer seeing 504s, instead we're seeing 503s now. I will merge it into T249745 since most likely the solution will be the same.