Page MenuHomePhabricator

Provide wmf-pt-kill on Debian Bullseye
Closed, ResolvedPublic

Description

@razzi is rebuilding some DB servers with Bullseye and thwarted by this package not being present.

I can certainly do a blind copy of the package from Buster to Bullseye, but I'm hoping someone (@Marostegui) will chime in about whether it needs a rebuild or not.

Event Timeline

The package is just a binary really and the systemd units. It should work out just fine most likely. I certainly do not have the bandwidth to look into rebuilding this package.

The binary works out of the box:

root@clouddb1013:~# pt-kill
Usage: pt-kill [OPTIONS] [DSN]

Errors in command-line arguments:
  * Specify at least one of --kill, --kill-query, --print, --execute-command or --stop
Andrew claimed this task.
root@apt1001:~# reprepro copy buster-wikimedia bullseye-wikimedia wmf-pt-kill

Clouddb1013 seems happy now, so we may be all set.

Thank you both! Looks good 👍 👍

Would it be possible to use the upstream package at this point? pt-kill is in debian. I'm not sure what patches are being applied beyond the debian version and couldn't find anything beyond: https://phabricator.wikimedia.org/T183983#3983899. Can someone help explain some context for this package? Thanks!

The problem is that we are using a patched version for wmf-pt-kill cause the original one had the bug you linked on your comment. I am going to give it a try to see if the new version works better. If that is the case, I don't think we really need to keep packaging it ourselves if all the systemctl stuff is on puppet (as I believe Brooke had to modify all that, but I would need to check). I will get back to you once I have tested the original binary

So from the first tests, it looks like it still has the same problem it used to have:

# 2022-05-03T08:17:05 KILL 11999086 (Execute 0 sec) SELECT DISTINCT p.page_id,p.page_title,p.page_namespace,(SELECT rev_timestamp FROM revision WHERE rev_id=p.page_latest LIMIT 1) AS page_touched,p.page_len,0 AS link_count FROM ( SELECT * FROM categorylinks WHERE cl_to IN (?)) cl0 INNER JOIN (page p) ON p.page_id=cl0.cl_from AND p.page_namespace=?

There's a new blog post talking about it, I am going to give it a read and see if I can modify the startup options to avoid that with the fixes.

Mentioned in SAL (#wikimedia-operations) [2022-05-03T09:14:29Z] <marostegui> Disable puppet on clouddb1013 clouddb1016 clouddb1020 T305974

@nskaggs I have tested different options and looks good with some other parameters. I am going to leave it running until tomorrow EU morning to confirm. If it all looks good, I can create a task for your team so you can modify puppet accordingly (we do not use pt-kill in production, it is just for wiki replicas).
Taking a quick look I don't think we need to change many things on puppet, just the package name and the start up options. I think the directories, log files and all that can be left as they are (but up to you all!).
Let me know if you want specific tags for the task (assuming everything goes fine in the next few hours).

Some of the queries aren't being killed:

11768742	s52788	10.64.37.27:35008	wikidatawiki_p	Query	16126	Sending data	/*{"qrun": 632602, "user": "MArostegui (WMF)"}*/ SELECT\n        page_id,page_title,wbx_text AS term_	0.000
11781157	s52788	10.64.37.27:56186	wikidatawiki_p	Query	12517	Sending data	/*{"qrun": 632602, "user": "MArostegui (WMF)"}*/ SELECT\n        page_id,page_title,wbx_text AS term_	0.000

They should've been killed after 10800 seconds so...that's bad news.

The suggested workaround didn't really work, as the above queries weren't killed, I am trying a different one now, but I don't have much hopes. Will report back

So I haven't found a way to make it work without our patched version, I have posted a comment on https://jira.percona.com/browse/PT-1492 (where I commented years ago about this behaviour).
For now we should keep using our package or install the debian one but distribute via puppet our patched binary to replace the default one (I wouldn't like this solution, but it is up to you all, as this isn't used in production).

@Marostegui Thank you for following up with upstream here! Perhaps they can ultimately produce a fix or resolution. Until then, yes, let's keep utilizing our existing version. Fingers crossed we can one day removing our custom package.