Needs more info on the description from the parent task.
|operations/software : master||Consider as busy all queries that are not in Sleep state|
|Resolved||Lucas_Werkmeister_WMDE||T173695 Enable constraint checks by default for users|
|Open||None||T103228 Improve performance of constraint check|
|Resolved||Lydia_Pintscher||T179839 Cache constraint check results|
|Resolved||Lydia_Pintscher||T179849 Cache all constraint check results per-entity|
|Resolved||Lucas_Werkmeister_WMDE||T181060 Cache constraint check results per-entity in ObjectCache (L) (days: 2)|
|Resolved||Lucas_Werkmeister_WMDE||T184812 Enable constraint result caching on Wikidata|
|Resolved||jcrespo||T188505 Investigate why query killer didn't kill 1-hour long queries|
There is a strange gap on any kind of killing activity between november and march:
171966558 | 2017-11-20 06:29:35 | wmf_slave_wikiuser_sleep | kill 1761131953 | 171978924 | 2018-03-01 16:41:05 | wmf_slave_wikiuser_sleep | kill 1633679517
Even if it was that, the new query killer didn't solve anyway the long running queries, those had to be killed independently.
Running SELECT sleep(70); as wikiuser to check it is at least in some way working.
Also shown on the logs:
171978924 | 2018-03-02 17:29:03 | wmf_slave_wikiuser_slow (>60) | kill 1731917342; SELECT sleep(100)
The same query was killed:
MariaDB [wikidatawiki]> SELECT /* Wikibase\Lib\Store\Sql\WikiPageEntityMetaDataLookup::selectRevisionInformationMultiple */ rev_id, rev_content_format, rev_timestamp, page_latest, page_is_redirect, old_id, old_text, old_flags, page_title FROM `page` INNER JOIN `revision` ON ((page_latest=rev_id)) INNER JOIN `text` ON ((old_id=rev_text_id)); ERROR 2013 (HY000): Lost connection to MySQL server during query | 171978924 | 2018-03-02 17:38:03 | wmf_slave_wikiuser_slow (>60) | kill 1732579668; SELECT rev_id, rev_content_format, rev_tim
The thesis is that for some reason, either the query killer was disabled or crashed, or other situation, that made it not working on that specific host.
I will check the query killer is updated to the latest version and active on all production hosts, then consider this resolved.
@Lucas_Werkmeister_WMDE I am going to resolve this ticket once it has been deployed to all of s8 (wikidata database section). I will deploy on the other sections more slowly. This is not a risk-free deploy, so please be vigilant if there is something weird happening regarding queries failing or similar issues. This will, however, unblock at least the deployments you wanted to do.