from the logs
slow query: UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ wbc_entity_usage SET eu_touched = '20120708225934' WHERE eu_page_id = '8924721' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q4142279'))
aude | |
Jul 29 2015, 5:20 PM |
F1511025: Screenshot from 2015-08-15 11:30:15.png | |
Aug 15 2015, 9:31 AM |
from the logs
slow query: UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ wbc_entity_usage SET eu_touched = '20120708225934' WHERE eu_page_id = '8924721' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q4142279'))
This query, which has to be executed in a serialized way due to replication, is causing a lag problem on some of the less powerful database slaves (db1047), and extra load on the rest of the servers. This is specially true for s3, where due to it containing hundreds of wikis, even with low traffic, it makes for most of its write load.
Please note that even if this query is fast, it can create a huge bottleneck on the replication topology. Please consider executing it on a separate shard or storage system to minimize its impact to production.
@jcrespo how would we have it execute on a separate shard?
do you mean like how we have watchlist and some other stuff on a separate db host?
https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php
if so, where do we put it?
we should also look into why the query is slow and how we might improve it.
@aude If you are increasing the priority based on my comment, don't. Regular databases can handle the extra load (for now), the main issue is in research databases or labs, where they contain all dbs, or when there is maintenance and we are in a degraded state. This currently is not a user-notice, but it is important enough to discuss a way to solve it.
So the issue is that extra writes are not a problem for our masters -they are done concurrently-, but when we try to scale out, most of the writes are serialized on the logs, and there has been many times when I have detected lag problems and these where probably the offenders. I do not think the queries are slow- but I may be wrong; do not have statistics. You are right: let's get numbers first: how many are executed and how much time it takes (and for myself, I need to fully understand the model) before rushing into solutions (my suggestion was a bit premature).
Sorry, reverting back to Unbreak now, because this is breaking almost all wikis and the problems have been spoted now on production machines, not only on support machines:
Executing things like this:
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150619102700' WHERE eu_page_id = '1256862' AND ((eu_aspect = 'L.hu' AND eu_entity_id = 'Q1017251') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1017251') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1017287') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1017287') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1019531') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1019531') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1022151') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1022151') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1076731') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1076731') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1272631') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1272631') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1632441') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1632441') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1689740') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1689740') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1734119') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1734119') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1756917') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1756917') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1762083') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1762083') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1767147') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1767147') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q176874') OR (eu_aspect = 'T' AND eu_entity_id = 'Q176874') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1784043') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1784043') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1785657') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1785657') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1812906') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1812906') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1823175') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1823175') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1846197') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1846197') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1937519') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1937519') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1938332') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1938332') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1957359') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1957359') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1969100') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1969100') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1970044') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1970044') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1996076') OR (eu_aspect = 'T' AND eu_entity_id = 'Q1996076') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2016409') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2016409') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2020661') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2020661') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2034240') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2034240') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2036571') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2036571') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2050445') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2050445') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2065507') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2065507') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2067264') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2067264') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2099112') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2099112') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2100308') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2100308') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2109649') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2109649') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2129186') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2129186') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2153515') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2153515') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2160020') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2160020') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2175955') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2175955') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2193693') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2193693') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2197198') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2197198') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2220097') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2220097') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2243697') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2243697') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2244200') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2244200') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2263864') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2263864') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2264335') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2264335') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2287034') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2287034') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2290410') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2290410') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2304891') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2304891') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2322979') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2322979') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2324192') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2324192') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2335877') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2335877') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2344334') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2344334') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2366179') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2366179') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2392305') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2392305') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2420887') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2420887') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2442130') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2442130') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2444902') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2444902') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2464264') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2464264') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2472043') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2472043') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2473413') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2473413') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2479323') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2479323') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2493498') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2493498') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2522268') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2522268') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2524669') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2524669') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2532729') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2532729') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2547458') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2547458') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2648623') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2648623') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2684713') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2684713') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2687667') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2687667') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2697539') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2697539') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700513') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700513') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700529') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700529') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700756') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700756') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700807') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700807') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700827') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700827') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700855') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700855') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700967') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700967') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702116') OR (eu_aspect = 'X' AND eu_entity_id = 'Q2702116') OR (eu_aspect = 'O' AND eu_entity_id = 'Q2702116') OR (eu_aspect = 'S' AND eu_entity_id = 'Q2702116') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702116') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702155') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702155') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702170') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702170') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702187') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702187') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702196') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702196') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702237') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702237') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2861721') OR (eu_aspect = 'T' AND eu_entity_id = 'Q2861721') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q375047') OR (eu_aspect = 'T' AND eu_entity_id = 'Q375047') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q388235') OR (eu_aspect = 'T' AND eu_entity_id = 'Q388235') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q388310') OR (eu_aspect = 'T' AND eu_entity_id = 'Q388310') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q393519') OR (eu_aspect = 'T' AND eu_entity_id = 'Q393519') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q393969') OR (eu_aspect = 'T' AND eu_entity_id = 'Q393969') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q520730') OR (eu_aspect = 'T' AND eu_entity_id = 'Q520730') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q521971') OR (eu_aspect = 'T' AND eu_entity_id = 'Q521971') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q534709') OR (eu_aspect = 'T' AND eu_entity_id = 'Q534709') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q582531') OR (eu_aspect = 'T' AND eu_entity_id = 'Q582531') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q585171') OR (eu_aspect = 'T' AND eu_entity_id = 'Q585171') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q593386') OR (eu_aspect = 'T' AND eu_entity_id = 'Q593386') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q593451') OR (eu_aspect = 'T' AND eu_entity_id = 'Q593451') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q601536') OR (eu_aspect = 'T' AND eu_entity_id = 'Q601536') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q604863') OR (eu_aspect = 'T' AND eu_entity_id = 'Q604863') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q608564') OR (eu_aspect = 'T' AND eu_entity_id = 'Q608564') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q628047') OR (eu_aspect = 'T' AND eu_entity_id = 'Q628047') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q630183') OR (eu_aspect = 'T' AND eu_entity_id = 'Q630183') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q647401') OR (eu_aspect = 'T' AND eu_entity_id = 'Q647401') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q656603') OR (eu_aspect = 'T' AND eu_entity_id = 'Q656603') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q729166') OR (eu_aspect = 'T' AND eu_entity_id = 'Q729166') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q735118') OR (eu_aspect = 'T' AND eu_entity_id = 'Q735118') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q739608') OR (eu_aspect = 'T' AND eu_entity_id = 'Q739608') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q770877') OR (eu_aspect = 'T' AND eu_entity_id = 'Q770877') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q812766') OR (eu_aspect = 'T' AND eu_entity_id = 'Q812766') OR (eu_aspect = 'T' AND eu_entity_id = 'Q838333') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q838333') OR (eu_aspect = 'X' AND eu_entity_id = 'Q838333') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q896888') OR (eu_aspect = 'T' AND eu_entity_id = 'Q896888') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q927541') OR (eu_aspect = 'T' AND eu_entity_id = 'Q927541') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q932034') OR (eu_aspect = 'T' AND eu_entity_id = 'Q932034') OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q962599') OR (eu_aspect = 'T' AND eu_entity_id = 'Q962599'))\G
creates a 1 minute lag on slaves (not only on s3), and when lag occurs, slaves are automatically depooled, creating connection spikes. I woud like the feature disabled until we can agree on a different way to do this.
You seem to be implementing a queue/message system. A relational database, that is replicated geographically is probably not the right place for this kind of data, as it probably doesn't need the availability and reliability of MySQL (it could be stored once, in memory Redis?). MySQL follow a OLTP model, so less suitable for this kind of queries. Even if we need them to be on the same schema, the model could be greatly improved- right now this is affecting other slave's reads.
ccing @ori as he told me he could help for query performance reasons.
This is the lag report of db1028.
Things seemed to have calmed for now.
And here it is actual data of an example spike:
Host User Schema Client Source Thread Transaction Runtime Stamp db1038 wikiuser simplewiki mw1001 - 5096510924 30A18B0D6 15s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20140130031809' WHERE eu_page_id = '232335' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser mwlwiki mw1006 - 5096510302 30A18AA42 14s 2015-08-15 10:03:55 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20130201200841' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser skwiki mw1009 - 5096509493 30A18A10F 13s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150218224016' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser elwiki mw1016 - 5096509480 30A18A0F7 13s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150427204519' WHERE eu_page_id = '260239' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser kawiki mw1015 - 5096509531 30A18A156 12s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150803200253' WHERE eu_page_id = '16312' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser guwiki mw1005 - 5096509757 30A18A40B 12s 2015-08-15 10:03:55 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100353' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser tlwiki mw1007 - 5096509558 30A18A19E 11s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20141113113613' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser simplewiki mw1012 - 5096509479 30A18A0F6 11s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100333' WHERE eu_page_id = '232335' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser bewiki mw1013 localhost 5096509524 30A18A14A 11s 2015-08-15 10:03:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 127.0.0.1 */ `wbc_entity_usage` SET eu_touched = '20150815100343' WHERE eu_page_id = '50792' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser bnwiki mw1008 - 5096509501 30A18A11E 11s 2015-08-15 10:03:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100343' WHERE eu_page_id = '796' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser srwiki mw1008 localhost 5096509510 30A18A12D 9s 2015-08-15 10:03:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 127.0.0.1 */ `wbc_entity_usage` SET eu_touched = '20150815100335' WHERE eu_page_id = '196554' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser sdwiki mw1014 - 5096509431 30A18A000 9s 2015-08-15 10:03:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100335' WHERE eu_page_id = '2238' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser hiwiki mw1013 - 5096510664 30A18AE18 9s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150520065035' WHERE eu_page_id = '220108' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser dawiki mw1008 - 5096510405 30A18AB5C 9s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100355' WHERE eu_page_id = '77' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser bnwiki mw1004 - 5096511317 30A18B4E0 9s 2015-08-15 10:04:15 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150811141738' WHERE eu_page_id = '796' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser sdwiki mw1011 - 5096509434 30A18A009 8s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150616152312' WHERE eu_page_id = '2238' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser hrwiki mw1002 - 5096509689 30A18A33B 8s 2015-08-15 10:03:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150328211552' WHERE eu_page_id = '4' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser liwiki mw1003 - 5096510879 30A18B05D 8s 2015-08-15 10:04:15 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100406' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser guwiki mw1007 - 5096509750 30A18A3E7 7s 2015-08-15 10:03:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150719055033' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser tlwiki mw1001 - 5096510816 30A18AFB4 7s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20141113113613' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser mrwiki mw1003 - 5096510809 30A18AFA7 7s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100357' WHERE eu_page_id = '2' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser tnwiki mw1006 - 5096510113 30A18A829 7s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100357' WHERE eu_page_id = '596' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser liwiki mw1003 - 5096510854 30A18B01D 6s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150302091823' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser hiwiki mw1015 - 5096524966 30A196449 6s 2015-08-15 10:14:45 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150520065035' WHERE eu_page_id = '220108' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser tnwiki mw1009 - 5096510155 30A18A889 5s 2015-08-15 10:03:55 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20140228020245' WHERE eu_page_id = '596' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser elwiki mw1015 - 5096511444 30A18B670 5s 2015-08-15 10:04:15 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150427204519' WHERE eu_page_id = '260239' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser hiwiki mw1016 - 5096510724 30A18AEBC 5s 2015-08-15 10:04:15 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100410' WHERE eu_page_id = '220108' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser dsbwiki mw1013 - 5096511261 30A18B44D 5s 2015-08-15 10:04:25 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100419' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser bewiki mw1002 - 5096509508 30A18A12B 3s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150124160624' WHERE eu_page_id = '50792' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser gawiki mw1002 - 5096509507 30A18A12A 3s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150803043836' WHERE eu_page_id = '19594' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser bnwiki mw1004 - 5096509488 30A18A106 3s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150811141738' WHERE eu_page_id = '796' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser srwiki mw1002 - 5096509468 30A18A0C7 3s 2015-08-15 10:03:35 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150715004222' WHERE eu_page_id = '196554' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser dawiki mw1010 - 5096510349 30A18AAB7 3s 2015-08-15 10:03:55 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20081227200149' WHERE eu_page_id = '77' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296')) db1038 wikiuser bewiki mw1011 localhost 5096511142 30A18B320 2s 2015-08-15 10:04:05 UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 127.0.0.1 */ `wbc_entity_usage` SET eu_touched = '20150815100402' WHERE eu_page_id = '50792' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
I looked into this some and some ways I think we can improve:
only do selective updates (for specific aspects) for wikis with the allowDataAccessInUserLanguage setting = true (e.g. multilingual wikis where we split parser cache by language, such as commons). As far as I can tell for other wikis, it would be sufficient to update touched for all with given eu_page_id w/o all the where conditions.
for multilingual wikis, maybe the touched updates for a page can be done in chunks / batches, in cases where a page uses a lot of entities and has a lot of aspects.
another idea... let's try to deduplicate the jobs more
we can set a root job (title + parser language code) so that the jobs can be further de-duplicated.
for example, in cases where an item or a client page is edited again when we still have older jobs in the queue, then the older jobs can become no-ops and we execute only the most recent job for a page (+parser language).
see T109448
T93885 will also help (have a specific limit for number of entities used in a page). we need to add a setting and then enforce the limit.
Proposal for a solution that does not need knowledge about how the parser cache is split:
Split the update into two steps:
This needs more resources overall, but hopefully reduces lock retention time and overall load on the master db.
Change 232917 had a related patch set uploaded (by Daniel Kinzler):
Use eu_row_id when touching usage entries.
Could this job have caused:
/* JobRunner::commitMasterChanges 127.0.0.1 */ GET_LOCK('jobrunner-serial-commit', 30) AS lockstatus /* 31bdf25cc42f95f4f4a9b493cafe0299 db1024 plwiki 11s */ /* localhost */
we had another large lag spike around the same time than other touchusages, but it could have just been a coincidence? This other "query" caused 7000 hits with an avg waiting time of 15 seconds, causing lag on s2. If not, I will open another ticket for this other issue.
Another proof that touchUsages may be too aaggressive is that sometimes it conflicts with itself. Updating by primary key, as suggested, should solve this.
Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 10.64.16.13 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.13) UPDATE `wbc_entity_usage` SET eu_touched = '20150822101128' WHERE eu_page_id = '616905' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q14484217'))
BTW, thank you for looking at this- you have been a really responsive team, and this is really appreciated.
I am almost sure that the first part of my previous comment was not caused by this job, but a very large LinksUpdate::incrTableUpdate, sorry about that.
We are now doing the update in batches of 100. But I think Mediawiki wraps all of them in an implicit transaction. To fix this bug, do we need to split that up into individual transactions, one for each update?
@JanZerebecki not sure if you are asking me: Yes, IF that is the case (I cannot say)- batching will actually be worse than no batching at all- as it will be slower than a single query and more open to clashing with other queries without any of the advantages (reduced lag).
@jcrespo: As far as I know, the DBO_TRX flag is set on the cluster, causing MediaWiki to automatically start a transaction on the first insert/update/delete. Such automatic transactions are committed when an explicit transaction is started, or the request terminates.
I will add explicit transactions wrapping each insert batch. This should improve performance at the cost of consistency, which is accepted practice across the MediaWiki codebase.
Change 233724 had a related patch set uploaded (by Daniel Kinzler):
EntityUsageTable: Wrap insert batches in transactions
Change 233743 had a related patch set uploaded (by JanZerebecki):
EntityUsageTable: Wrap batches in transactions
Patches merged. Let's keep this open for a while and observe whether the problem is indeed gone.
I now see some slow queries in the logs for touch usages:
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsageBatch */ wbc_entity_usage SET eu_touched = '20150907064933' WHERE eu_row_id IN ('2430363','2430361','2430345','2430360','2430348','61591','61590','107383','2430347','2430358','2430343','2430362','2430365','2430351','2430353','2430354','2430344','rEVED243035664a3b','2430350','2430366','2430355','2430359','2430352','2430357','2430346','2430349','2430364')
Created a new task for this: T111769