Page MenuHomePhabricator

Verify Proton can handle Queue timeouts properly
Closed, ResolvedPublic

Description

Looks like QueueSystem can stop processing queued jobs when many tasks timeout (please see https://gerrit.wikimedia.org/r/#/c/mediawiki/services/chromium-render/+/480150/ for more information).

Please verify that Queue works properly, and if there is a flaw in timeouts please fix it.

Event Timeline

pmiazga created this task.Dec 19 2018, 5:45 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 19 2018, 5:45 PM
pmiazga closed this task as Declined.Dec 20 2018, 12:34 AM

Queue works properly, Test was invalid as it assigns the same ID to multiple tasks.

Change 480150 had a related patch set uploaded (by Pmiazga; owner: Jhernandez):
[mediawiki/services/chromium-render@master] WIP: Add bulk test, and fix JobTimeout stalling the queue

https://gerrit.wikimedia.org/r/480150

pmiazga reopened this task as Open.Dec 20 2018, 12:37 AM

Sorry, I focused on tracking bigger issue so much I missed that there was a small error in the queue that is fixed by https://gerrit.wikimedia.org/r/480150. The bigger issue that task was still failing even after the fix was related to ids conflict.

ovasileva triaged this task as High priority.Jan 7 2019, 3:39 PM

The task still requires code review. It already has +1 from Petr, and it needs +2 from someone.

Change 480150 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Add bulk test, and fix JobTimeout stalling the queue

https://gerrit.wikimedia.org/r/480150

Niedzielski closed this task as Resolved.EditedJan 14 2019, 6:39 PM

I did a smoke test of this change by reducing all the timeouts by an order of magnitude and launching about 30 requests. About half the requests completed successfully and the rest returned 503 noting "Queue full. Please try again later".

Edit: Also, the service was ready to receive additional requests after the flood subsided.