Page MenuHomePhabricator

[Bug] EntityUsageTable::touchUsages slow query
Closed, ResolvedPublic

Description

from the logs

slow query: UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ wbc_entity_usage SET eu_touched = '20120708225934' WHERE eu_page_id = '8924721' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q4142279'))

Event Timeline

aude raised the priority of this task from to Needs Triage.
aude updated the task description. (Show Details)
aude updated the task description. (Show Details)
aude set Security to None.
aude added subscribers: aude, Aklapper.

This query, which has to be executed in a serialized way due to replication, is causing a lag problem on some of the less powerful database slaves (db1047), and extra load on the rest of the servers. This is specially true for s3, where due to it containing hundreds of wikis, even with low traffic, it makes for most of its write load.

Please note that even if this query is fast, it can create a huge bottleneck on the replication topology. Please consider executing it on a separate shard or storage system to minimize its impact to production.

aude triaged this task as High priority.Aug 13 2015, 3:57 AM

@jcrespo how would we have it execute on a separate shard?

do you mean like how we have watchlist and some other stuff on a separate db host?

https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php

if so, where do we put it?

we should also look into why the query is slow and how we might improve it.

aude renamed this task from EntityUsageTable::touchUsages slow query to [Bug] EntityUsageTable::touchUsages slow query.Aug 13 2015, 1:04 PM
aude raised the priority of this task from High to Unbreak Now!.

@aude If you are increasing the priority based on my comment, don't. Regular databases can handle the extra load (for now), the main issue is in research databases or labs, where they contain all dbs, or when there is maintenance and we are in a degraded state. This currently is not a user-notice, but it is important enough to discuss a way to solve it.

So the issue is that extra writes are not a problem for our masters -they are done concurrently-, but when we try to scale out, most of the writes are serialized on the logs, and there has been many times when I have detected lag problems and these where probably the offenders. I do not think the queries are slow- but I may be wrong; do not have statistics. You are right: let's get numbers first: how many are executed and how much time it takes (and for myself, I need to fully understand the model) before rushing into solutions (my suggestion was a bit premature).

@jcrespo thanks for helping look into this.

aude lowered the priority of this task from Unbreak Now! to High.Aug 14 2015, 12:39 PM
jcrespo raised the priority of this task from High to Unbreak Now!.Aug 15 2015, 8:41 AM
jcrespo added a subscriber: ori.

Sorry, reverting back to Unbreak now, because this is breaking almost all wikis and the problems have been spoted now on production machines, not only on support machines:

Executing things like this:

UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages  */  `wbc_entity_usage` SET eu_touched = '20150619102700' 
WHERE eu_page_id = '1256862' AND ((eu_aspect = 'L.hu' AND eu_entity_id = 'Q1017251') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1017251') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1017287') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1017287') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1019531') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1019531') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1022151') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1022151') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1076731') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1076731') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1272631') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1272631') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1632441') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1632441') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1689740') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1689740') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1734119') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1734119') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1756917') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1756917') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1762083') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1762083') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1767147') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1767147') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q176874') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q176874') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1784043') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1784043') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1785657') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1785657') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1812906') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1812906') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1823175') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1823175') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1846197') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1846197') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1937519') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1937519') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1938332') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1938332') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1957359') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1957359') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1969100') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1969100') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1970044') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1970044') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q1996076') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q1996076') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2016409') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2016409') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2020661') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2020661') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2034240') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2034240') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2036571') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2036571') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2050445') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2050445') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2065507') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2065507') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2067264') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2067264') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2099112') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2099112') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2100308') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2100308') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2109649') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2109649') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2129186') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2129186') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2153515') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2153515') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2160020') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2160020') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2175955') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2175955') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2193693') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2193693') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2197198') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2197198') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2220097') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2220097') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2243697') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2243697') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2244200') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2244200') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2263864') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2263864') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2264335') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2264335') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2287034') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2287034') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2290410') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2290410') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2304891') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2304891') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2322979') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2322979') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2324192') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2324192') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2335877') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2335877') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2344334') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2344334') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2366179') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2366179') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2392305') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2392305') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2420887') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2420887') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2442130') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2442130') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2444902') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2444902') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2464264') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2464264') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2472043') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2472043') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2473413') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2473413') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2479323') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2479323') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2493498') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2493498') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2522268') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2522268') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2524669') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2524669') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2532729') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2532729') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2547458') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2547458') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2648623') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2648623') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2684713') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2684713') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2687667') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2687667') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2697539') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2697539') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700513') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700513') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700529') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700529') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700756') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700756') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700807') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700807') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700827') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700827') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700855') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700855') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2700967') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2700967') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702116') 
OR (eu_aspect = 'X' AND eu_entity_id = 'Q2702116') 
OR (eu_aspect = 'O' AND eu_entity_id = 'Q2702116') 
OR (eu_aspect = 'S' AND eu_entity_id = 'Q2702116') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702116') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702155') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702155') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702170') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702170') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702187') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702187') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702196') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702196') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2702237') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2702237') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q2861721') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q2861721') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q375047') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q375047') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q388235') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q388235') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q388310') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q388310') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q393519') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q393519') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q393969') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q393969') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q520730') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q520730') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q521971') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q521971') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q534709') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q534709') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q582531') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q582531') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q585171') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q585171') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q593386') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q593386') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q593451') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q593451') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q601536') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q601536') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q604863') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q604863') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q608564') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q608564') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q628047') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q628047') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q630183') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q630183') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q647401') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q647401') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q656603') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q656603') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q729166') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q729166') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q735118') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q735118') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q739608') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q739608') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q770877') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q770877') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q812766') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q812766') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q838333') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q838333') 
OR (eu_aspect = 'X' AND eu_entity_id = 'Q838333') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q896888') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q896888') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q927541') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q927541') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q932034') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q932034') 
OR (eu_aspect = 'L.hu' AND eu_entity_id = 'Q962599') 
OR (eu_aspect = 'T' AND eu_entity_id = 'Q962599'))\G

creates a 1 minute lag on slaves (not only on s3), and when lag occurs, slaves are automatically depooled, creating connection spikes. I woud like the feature disabled until we can agree on a different way to do this.

You seem to be implementing a queue/message system. A relational database, that is replicated geographically is probably not the right place for this kind of data, as it probably doesn't need the availability and reliability of MySQL (it could be stored once, in memory Redis?). MySQL follow a OLTP model, so less suitable for this kind of queries. Even if we need them to be on the same schema, the model could be greatly improved- right now this is affecting other slave's reads.

ccing @ori as he told me he could help for query performance reasons.

jcrespo lowered the priority of this task from Unbreak Now! to High.EditedAug 15 2015, 9:31 AM

This is the lag report of db1028.

Screenshot from 2015-08-15 11:30:15.png (300×646 px, 19 KB)

All slaves at s7had the same 6-10 seconds lag spike at 8:01.

Things seemed to have calmed for now.

And here it is actual data of an example spike:

Host 	User 	Schema 	Client 	Source 	Thread 	Transaction 	Runtime 	Stamp
db1038	wikiuser	simplewiki	mw1001	-	5096510924	30A18B0D6	15s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20140130031809' WHERE eu_page_id = '232335' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	mwlwiki	mw1006	-	5096510302	30A18AA42	14s	2015-08-15 10:03:55
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20130201200841' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	skwiki	mw1009	-	5096509493	30A18A10F	13s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150218224016' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	elwiki	mw1016	-	5096509480	30A18A0F7	13s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150427204519' WHERE eu_page_id = '260239' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	kawiki	mw1015	-	5096509531	30A18A156	12s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150803200253' WHERE eu_page_id = '16312' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	guwiki	mw1005	-	5096509757	30A18A40B	12s	2015-08-15 10:03:55
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100353' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	tlwiki	mw1007	-	5096509558	30A18A19E	11s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20141113113613' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	simplewiki	mw1012	-	5096509479	30A18A0F6	11s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100333' WHERE eu_page_id = '232335' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	bewiki	mw1013	localhost	5096509524	30A18A14A	11s	2015-08-15 10:03:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 127.0.0.1 */ `wbc_entity_usage` SET eu_touched = '20150815100343' WHERE eu_page_id = '50792' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	bnwiki	mw1008	-	5096509501	30A18A11E	11s	2015-08-15 10:03:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100343' WHERE eu_page_id = '796' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	srwiki	mw1008	localhost	5096509510	30A18A12D	9s	2015-08-15 10:03:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 127.0.0.1 */ `wbc_entity_usage` SET eu_touched = '20150815100335' WHERE eu_page_id = '196554' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	sdwiki	mw1014	-	5096509431	30A18A000	9s	2015-08-15 10:03:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100335' WHERE eu_page_id = '2238' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	hiwiki	mw1013	-	5096510664	30A18AE18	9s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150520065035' WHERE eu_page_id = '220108' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	dawiki	mw1008	-	5096510405	30A18AB5C	9s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100355' WHERE eu_page_id = '77' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	bnwiki	mw1004	-	5096511317	30A18B4E0	9s	2015-08-15 10:04:15
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150811141738' WHERE eu_page_id = '796' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	sdwiki	mw1011	-	5096509434	30A18A009	8s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150616152312' WHERE eu_page_id = '2238' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	hrwiki	mw1002	-	5096509689	30A18A33B	8s	2015-08-15 10:03:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150328211552' WHERE eu_page_id = '4' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	liwiki	mw1003	-	5096510879	30A18B05D	8s	2015-08-15 10:04:15
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100406' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	guwiki	mw1007	-	5096509750	30A18A3E7	7s	2015-08-15 10:03:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150719055033' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	tlwiki	mw1001	-	5096510816	30A18AFB4	7s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20141113113613' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	mrwiki	mw1003	-	5096510809	30A18AFA7	7s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100357' WHERE eu_page_id = '2' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	tnwiki	mw1006	-	5096510113	30A18A829	7s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100357' WHERE eu_page_id = '596' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	liwiki	mw1003	-	5096510854	30A18B01D	6s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150302091823' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	hiwiki	mw1015	-	5096524966	30A196449	6s	2015-08-15 10:14:45
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150520065035' WHERE eu_page_id = '220108' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	tnwiki	mw1009	-	5096510155	30A18A889	5s	2015-08-15 10:03:55
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20140228020245' WHERE eu_page_id = '596' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	elwiki	mw1015	-	5096511444	30A18B670	5s	2015-08-15 10:04:15
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150427204519' WHERE eu_page_id = '260239' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	hiwiki	mw1016	-	5096510724	30A18AEBC	5s	2015-08-15 10:04:15
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100410' WHERE eu_page_id = '220108' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	dsbwiki	mw1013	-	5096511261	30A18B44D	5s	2015-08-15 10:04:25
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150815100419' WHERE eu_page_id = '1' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	bewiki	mw1002	-	5096509508	30A18A12B	3s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150124160624' WHERE eu_page_id = '50792' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	gawiki	mw1002	-	5096509507	30A18A12A	3s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150803043836' WHERE eu_page_id = '19594' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	bnwiki	mw1004	-	5096509488	30A18A106	3s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150811141738' WHERE eu_page_id = '796' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	srwiki	mw1002	-	5096509468	30A18A0C7	3s	2015-08-15 10:03:35
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20150715004222' WHERE eu_page_id = '196554' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	dawiki	mw1010	-	5096510349	30A18AAB7	3s	2015-08-15 10:03:55
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages */ `wbc_entity_usage` SET eu_touched = '20081227200149' WHERE eu_page_id = '77' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))
db1038	wikiuser	bewiki	mw1011	localhost	5096511142	30A18B320	2s	2015-08-15 10:04:05
UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 127.0.0.1 */ `wbc_entity_usage` SET eu_touched = '20150815100402' WHERE eu_page_id = '50792' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q5296'))

I looked into this some and some ways I think we can improve:

only do selective updates (for specific aspects) for wikis with the allowDataAccessInUserLanguage setting = true (e.g. multilingual wikis where we split parser cache by language, such as commons). As far as I can tell for other wikis, it would be sufficient to update touched for all with given eu_page_id w/o all the where conditions.

for multilingual wikis, maybe the touched updates for a page can be done in chunks / batches, in cases where a page uses a lot of entities and has a lot of aspects.

another idea... let's try to deduplicate the jobs more

we can set a root job (title + parser language code) so that the jobs can be further de-duplicated.

for example, in cases where an item or a client page is edited again when we still have older jobs in the queue, then the older jobs can become no-ops and we execute only the most recent job for a page (+parser language).

see T109448

T93885 will also help (have a specific limit for number of entities used in a page). we need to add a setting and then enforce the limit.

Proposal for a solution that does not need knowledge about how the parser cache is split:

Split the update into two steps:

  1. a SELECT query using the WHERE clause we currently see in the UPDATE query, collecting all the matching eu_row_ids.
  2. an UPDATE query using the row ids to identify the rows to be updated.

This needs more resources overall, but hopefully reduces lock retention time and overall load on the master db.

Change 232917 had a related patch set uploaded (by Daniel Kinzler):
Use eu_row_id when touching usage entries.

https://gerrit.wikimedia.org/r/232917

Could this job have caused:

/* JobRunner::commitMasterChanges 127.0.0.1 */ GET_LOCK('jobrunner-serial-commit', 30) AS lockstatus /* 31bdf25cc42f95f4f4a9b493cafe0299 db1024 plwiki 11s */ /* localhost */

we had another large lag spike around the same time than other touchusages, but it could have just been a coincidence? This other "query" caused 7000 hits with an avg waiting time of 15 seconds, causing lag on s2. If not, I will open another ticket for this other issue.

Another proof that touchUsages may be too aaggressive is that sometimes it conflicts with itself. Updating by primary key, as suggested, should solve this.

Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsages 10.64.16.13 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.13) UPDATE `wbc_entity_usage` SET eu_touched = '20150822101128' WHERE eu_page_id = '616905' AND ((eu_aspect = 'S' AND eu_entity_id = 'Q14484217'))

BTW, thank you for looking at this- you have been a really responsive team, and this is really appreciated.

I am almost sure that the first part of my previous comment was not caused by this job, but a very large LinksUpdate::incrTableUpdate, sorry about that.

Change 232917 merged by jenkins-bot:
Use eu_row_id when touching usage entries.

https://gerrit.wikimedia.org/r/232917

We are now doing the update in batches of 100. But I think Mediawiki wraps all of them in an implicit transaction. To fix this bug, do we need to split that up into individual transactions, one for each update?

@JanZerebecki not sure if you are asking me: Yes, IF that is the case (I cannot say)- batching will actually be worse than no batching at all- as it will be slower than a single query and more open to clashing with other queries without any of the advantages (reduced lag).

@jcrespo: As far as I know, the DBO_TRX flag is set on the cluster, causing MediaWiki to automatically start a transaction on the first insert/update/delete. Such automatic transactions are committed when an explicit transaction is started, or the request terminates.

I will add explicit transactions wrapping each insert batch. This should improve performance at the cost of consistency, which is accepted practice across the MediaWiki codebase.

Change 233724 had a related patch set uploaded (by Daniel Kinzler):
EntityUsageTable: Wrap insert batches in transactions

https://gerrit.wikimedia.org/r/233724

Change 233743 had a related patch set uploaded (by JanZerebecki):
EntityUsageTable: Wrap batches in transactions

https://gerrit.wikimedia.org/r/233743

Change 233724 merged by jenkins-bot:
EntityUsageTable: Wrap batches in transactions

https://gerrit.wikimedia.org/r/233724

Patches merged. Let's keep this open for a while and observe whether the problem is indeed gone.

Change 233743 merged by jenkins-bot:
EntityUsageTable: Wrap batches in transactions

https://gerrit.wikimedia.org/r/233743

I now see some slow queries in the logs for touch usages:

UPDATE /* Wikibase\Client\Usage\Sql\EntityUsageTable::touchUsageBatch */ wbc_entity_usage SET eu_touched = '20150907064933' WHERE eu_row_id IN ('2430363','2430361','2430345','2430360','2430348','61591','61590','107383','2430347','2430358','2430343','2430362','2430365','2430351','2430353','2430354','2430344','rEVED243035664a3b','2430350','2430366','2430355','2430359','2430352','2430357','2430346','2430349','2430364')

Created a new task for this: T111769