Page MenuHomePhabricator

Analyse PAWS query killer
Closed, ResolvedPublic

Description

Paws has a query killer component that is currently not working. We need to analyse if we should fix this or leave it to general query killer on WikiReplicas.

Current error:

DBI connect(';host=enwiki.analytics.db.svc.eqiad.wmflabs;mysql_read_default_group=client','****',...) failed: SSL connection error: unknown error number at /usr/bin/pt-kill line 2087.

By the way, the pt-kill used is the one shipped with percona-toolkit in Ubuntu Artful Aardvark which comes out as version 2.2.0. This maybe as simple as using a more current pt-kill or a patched version.

Event Timeline

@Marostegui sorry for tagging without more context. I wanted an opinion from the DBA if we should have a query killer specific for PAWS' queries.

It is my understanding this was setup when it was pointing to labsdb1001 (up to a few months ago), now that it is pointing to the analytics Wikireplica, should we keep a query-killer specific for PAWS queries?

It is setup with a low max-query time, fwiw:

maxQueryTime: 1800

I am inclined to remove the query-killer image entirely from PAWS, but if it is a needed or helpful component I can see to it that it starts working again.

Do you know from how long the PAWS query killer hasn't been working? Maybe it has been like that for quite a while.
I would say leave it stopped, and let the specific query killers from the wiki replicas do their job. We are fine tuning them, but they are looking good so far + your specific maxQueryTime that should be good.
Let's leave it like that and monitor it to see if we see something weird.

Mentioned in SAL (#wikimedia-cloud) [2018-02-21T17:02:59Z] <chicocvenancio> deleted query-killer k8s deployment T187818

Chicocvenancio changed the task status from Open to Stalled.Feb 21 2018, 5:08 PM
Chicocvenancio triaged this task as Medium priority.

Do you know from how long the PAWS query killer hasn't been working? Maybe it has been like that for quite a while.

Not sure, probably at least since November, maybe longer.

Thanks for the input, I'll leave this open for a while but will send a pull-request to remove the query-killer from PAWS entirely.

I'm manually deleting the deployment to keep the query-killer image from being recreated every 5 minutes.

yuvipanda@tools-paws-master-01:~$ kubectl --namespace=prod delete deploy query-killer
deployment "query-killer" deleted

To keep it from being recreated after every new commit I'll merge the commenting out of the image definition before the next change.

Chicocvenancio claimed this task.

I'm declaring the PAWS query-killer analysed and unneeded. I will leave it disabled unless it is specifically needed in the future.