Page MenuHomePhabricator

configure pt-kill for wikiuser on coredbs
Closed, ResolvedPublic

Description

Author: afeldman

Description:

pt-kill is already installed on all db's via the percona-toolkit package.
configure a cron to kill queries running as wikiuser (only, long-running
queries from scripts, jobs, and crons run as wikiadmin) for more than 10 (?)
minutes.
i.e. pt-kill --busy-time 60 --match-user wikiuser --kill

Details

Reference
rt5593

Event Timeline

rtimport raised the priority of this task from to Medium.Dec 18 2014, 1:40 AM
rtimport added a project: ops-core.
rtimport set Reference to rt5593.

On 2013-08-12 17:33:05, afeldman wrote:

i.e. pt-kill --busy-time 60 --match-user wikiuser --kill

to clarify, `--busy-time 60` is secs. so to do 10 mins it would be `--busy-time 600`
(I just read a few parts of the man page)

Status changed from 'new' to 'open' by RT_System

afeldman wrote:

On Mon Aug 12 20:50:50 2013, jeremyb wrote:

On 2013-08-12 17:33:05, afeldman wrote:

i.e. pt-kill --busy-time 60 --match-user wikiuser --kill

to clarify, --busy-time 60 is secs. so to do 10 mins it would be `--
busy-time 600`

(I just read a few parts of the man page)

Thank you very much for clarifying that 10 minutes is 600 seconds and not 60!

https://bugzilla.wikimedia.org/show_bug.cgi?id=52979
According the MaxSem, the query (presumably running as wikiuser) had ben
running for...
[Time] => 140255
so this is starting to get a bit more urgent...

pt-kill is running manually on s5 slaves today, from tin, with verbose logging
to catch info for bug 52979.
As discussed with binasher on IRC, will make it a general module rather than
just a cron job. Plus it needs to report somewhere (probably to the
query_digests host I gues).

pt-kill proved to be too blunt an instrument. Currently testing
gerrit:operations/software/dbtools/arbiter.pl

Dzahn changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".Feb 25 2015, 12:21 AM
Dzahn changed the edit policy from "WMF-NDA (Project)" to "All Users".
Dzahn set Security to None.

The solution in place is [1]. It still isn't perfect but good enough for now and doesn't require an external service that can be blocked by max_connections.

[1] https://git.wikimedia.org/blob/operations%2Fsoftware/c5d2a8edd61bf964c7809c660d736a6d064c4b9c/dbtools%2Fevents_coredb_slave.sql