Page MenuHomePhabricator

Lower automatic query killing threshold to 55 seconds
Closed, DeclinedPublic

Description

Currently the query killer looks for queries that have been running for longer than 60 seconds. However, the MediaWiki wall clock timeout is also 60 seconds, so if a query takes >= 60 seconds to run, MediaWiki will give up before the user gets the response.

Lowering it to 55 seconds is conservative proposal, but should be seen as just the first step. If 55s goes well, we can wait a bit and then lower it again, and repeat, until we're at a level we're comfortable with, both on the user side and SRE side.

Additionally, having metrics about this will help gauge the impact of this change (T293531: Monitor/dashboard number of queries killed by the automatic query killer).

Event Timeline

I have nothing against this but it is a quite massive task as we need to drop and recreate the query killer across all hosts (without replication, so one by one) - so before we commit to this I would like this to be signed off by someone else.

I don't want us to deploy this and then have to revert (or go for a different value after two days) cause it is killing more queries than expected.

Having said that, I don't think it will make much difference from a user impacting point of view as a query that already takes 45-50 seconds can easily take more than 60 depending on the workload at the time. So I'm ok with this change.

Marostegui triaged this task as Medium priority.Oct 18 2021, 7:43 AM
Marostegui moved this task from Triage to Refine on the DBA board.

I am going to close this as declined as I don't see much interest on doing this for now. If someone feels this needs to be worked on, let's reopen and discuss T293533#7435037