Page MenuHomePhabricator

Verify Proton can handle Queue timeouts properly
Closed, ResolvedPublic

Description

Looks like QueueSystem can stop processing queued jobs when many tasks timeout (please see https://gerrit.wikimedia.org/r/#/c/mediawiki/services/chromium-render/+/480150/ for more information).

Please verify that Queue works properly, and if there is a flaw in timeouts please fix it.

Event Timeline

Queue works properly, Test was invalid as it assigns the same ID to multiple tasks.

Change 480150 had a related patch set uploaded (by Pmiazga; owner: Jhernandez):
[mediawiki/services/chromium-render@master] WIP: Add bulk test, and fix JobTimeout stalling the queue

https://gerrit.wikimedia.org/r/480150

Sorry, I focused on tracking bigger issue so much I missed that there was a small error in the queue that is fixed by https://gerrit.wikimedia.org/r/480150. The bigger issue that task was still failing even after the fix was related to ids conflict.

The task still requires code review. It already has +1 from Petr, and it needs +2 from someone.

Change 480150 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Add bulk test, and fix JobTimeout stalling the queue

https://gerrit.wikimedia.org/r/480150

Niedzielski closed this task as Resolved.EditedJan 14 2019, 6:39 PM

I did a smoke test of this change by reducing all the timeouts by an order of magnitude and launching about 30 requests. About half the requests completed successfully and the rest returned 503 noting "Queue full. Please try again later".

Edit: Also, the service was ready to receive additional requests after the flood subsided.