Hi, today i tryed loading https://meta.wikimedia.org/wiki/Privacy_policy which was stuck in loading state.
Wikipedia en was loading very slowly too.
Paladox | |
May 22 2018, 1:37 PM |
F18512790: Screenshot_٢٠١٨٠٥٢٤-٢٢٤٧٥٨.jpg | |
May 24 2018, 7:50 PM |
F18473907: metawiki_overload.png | |
May 22 2018, 3:20 PM |
Hi, today i tryed loading https://meta.wikimedia.org/wiki/Privacy_policy which was stuck in loading state.
Wikipedia en was loading very slowly too.
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
mediawiki/extensions/Translate | master | +17 -4 | Optimize index on translate_reviews |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | PRODUCTION ERROR | Pginer-WMF | T195293 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) | ||
Resolved | Marostegui | T201011 Apply schema change to translate_reviews in WMF |
Same in Russian Wikipedia (from Russia):
Request from […] via cp1054 cp1054, Varnish XID 382637319 Error: 503, Backend fetch failed at Tue, 22 May 2018 13:43:18 GMT
I guess this affects all Wikimedia projects right now.
@Xaosflux be aware that editing the comment doesn't erase the IP info. It's still visible in "View edit history", on the comment's contextual menu.
Things have recovered - the DBs errors, so far, look like a consequence and not the cause.
We are investigating
The issue seems to point to
Hits Tmax Tavg Tsum Hosts Users Schemas 33533 15 5 173,069 db1079, db1086, db1094 wikiuser metawiki
That is the equivalent to 10-100 database hours of work in 5 minutes.
The query is very slow, normally from memory, it takes 0.5 seconds, but that is quite bad, and I can see it running with no index and scanning 200K rows.
EXPLAIN SELECT /* MessageCollection::loadReviewInfo <IP> */ page_namespace, page_title, trr_user FROM `page` JOIN `translate_reviews` ON ((page_id=trr_page) AND (page_latest=trr_revision)) WHERE (page_namespace = '1198' AND page_title IN ('Privacy_policy/Page_display_title/en', 'Privacy_policy/275/en', 'Privacy_policy/244/en', 'Privacy_policy/245/en', 'Privacy_policy/22/en', 'Privacy_policy/23/en', 'Privacy_policy/24/en', 'Privacy_policy/25/en', 'Privacy_policy/26/en', 'Privacy_policy/246/en', 'Privacy_policy/247/en', 'Privacy_policy/243/en', 'Privacy_policy/242/en', 'Privacy_policy/87/en', 'Privacy_policy/29/en', 'Privacy_policy/186/en', 'Privacy_policy/187/en', 'Privacy_policy/188/en', 'Privacy_policy/189/en', 'Privacy_policy/190/en', 'Privacy_policy/30/en', 'Privacy_policy/31/en', 'Privacy_policy/32/en', 'Privacy_policy/33/en', 'Privacy_policy/34/en', 'Privacy_policy/35/en', 'Privacy_policy/248/en', 'Privacy_policy/36/en', 'Privacy_policy/58/en', 'Privacy_policy/59/en', 'Privacy_policy/60/en', 'Privacy_policy/61/en', 'Privacy_policy/62/en', 'Privacy_policy/63/en', 'Privacy_policy/64/en', 'Privacy_policy/65/en', 'Privacy_policy/66/en', 'Privacy_policy/185/en', 'Privacy_policy/89/en', 'Privacy_policy/90/en', 'Privacy_policy/91/en', 'Privacy_policy/191/en', 'Privacy_policy/93/en', 'Privacy_policy/94/en', 'Privacy_policy/239/en', 'Privacy_policy/249/en', 'Privacy_policy/95/en', 'Privacy_policy/193/en', 'Privacy_policy/194/en', 'Privacy_policy/195/en', 'Privacy_policy/96/en', 'Privacy_policy/97/en', 'Privacy_policy/98/en', 'Privacy_policy/250/en', 'Privacy_policy/99/en', 'Privacy_policy/100/en', 'Privacy_policy/251/en', 'Privacy_policy/125/en', 'Privacy_policy/126/en', 'Privacy_policy/210/en', 'Privacy_policy/127/en', 'Privacy_policy/211/en', 'Privacy_policy/128/en', 'Privacy_policy/129/en', 'Privacy_policy/213/en', 'Privacy_policy/130/en', 'Privacy_policy/131/en', 'Privacy_policy/132/en', 'Privacy_policy/101/en', 'Privacy_policy/197/en', 'Privacy_policy/200/en', 'Privacy_policy/252/en', 'Privacy_policy/102/en', 'Privacy_policy/199/en', 'Privacy_policy/103/en', 'Privacy_policy/104/en', 'Privacy_policy/105/en', 'Privacy_policy/201/en', 'Privacy_policy/106/en', 'Privacy_policy/202/en', 'Privacy_policy/107/en', 'Privacy_policy/108/en', 'Privacy_policy/109/en', 'Privacy_policy/110/en', 'Privacy_policy/111/en', 'Privacy_policy/112/en', 'Privacy_policy/113/en', 'Privacy_policy/114/en', 'Privacy_policy/115/en', 'Privacy_policy/116/en', 'Privacy_policy/117/en', 'Privacy_policy/240/en', 'Privacy_policy/203/en', 'Privacy_policy/118/en', 'Privacy_policy/119/en', 'Privacy_policy/120/en', 'Privacy_policy/253/en', 'Privacy_policy/254/en', 'Privacy_policy/255/en', 'Privacy_policy/256/en', 'Privacy_policy/257/en', 'Privacy_policy/258/en', 'Privacy_policy/259/en', 'Privacy_policy/260/en', 'Privacy_policy/261/en', 'Privacy_policy/262/en', 'Privacy_policy/123/en', 'Privacy_policy/263/en', 'Privacy_policy/264/en', 'Privacy_policy/265/en', 'Privacy_policy/266/en', 'Privacy_policy/267/en', 'Privacy_policy/268/en', 'Privacy_policy/269/en', 'Privacy_policy/270/en', 'Privacy_policy/271/en', 'Privacy_policy/272/en', 'Privacy_policy/133/en', 'Privacy_policy/134/en', 'Privacy_policy/135/en', 'Privacy_policy/216/en', 'Privacy_policy/136/en', 'Privacy_policy/217/en', 'Privacy_policy/137/en', 'Privacy_policy/218/en', 'Privacy_policy/138/en', 'Privacy_policy/139/en', 'Privacy_policy/140/en', 'Privacy_policy/219/en', 'Privacy_policy/141/en', 'Privacy_policy/220/en', 'Privacy_policy/142/en', 'Privacy_policy/221/en', 'Privacy_policy/143/en', 'Privacy_policy/236/en', 'Privacy_policy/144/en', 'Privacy_policy/145/en', 'Privacy_policy/146/en', 'Privacy_policy/222/en', 'Privacy_policy/147/en', 'Privacy_policy/223/en', 'Privacy_policy/148/en', 'Privacy_policy/273/en', 'Privacy_policy/178/en', 'Privacy_policy/149/en', 'Privacy_policy/224/en', 'Privacy_policy/150/en', 'Privacy_policy/151/en', 'Privacy_policy/152/en', 'Privacy_policy/227/en', 'Privacy_policy/153/en', 'Privacy_policy/228/en', 'Privacy_policy/154/en', 'Privacy_policy/155/en', 'Privacy_policy/156/en', 'Privacy_policy/157/en', 'Privacy_policy/237/en', 'Privacy_policy/158/en', 'Privacy_policy/274/en', 'Privacy_policy/229/en', 'Privacy_policy/159/en', 'Privacy_policy/230/en', 'Privacy_policy/160/en', 'Privacy_policy/161/en', 'Privacy_policy/162/en', 'Privacy_policy/163/en', 'Privacy_policy/166/en', 'Privacy_policy/164/en', 'Privacy_policy/165/en', 'Privacy_policy/231/en', 'Privacy_policy/180/en', 'Privacy_policy/232/en', 'Privacy_policy/233/en', 'Privacy_policy/181/en', 'Privacy_policy/182/en', 'Privacy_policy/183/en', 'Privacy_policy/234/en', 'Privacy_policy/167/en', 'Privacy_policy/235/en', 'Privacy_policy/168/en', 'Privacy_policy/169/en', 'Privacy_policy/170/en', 'Privacy_policy/171/en', 'Privacy_policy/172/en', 'Privacy_policy/173/en', 'Privacy_policy/174/en', 'Privacy_policy/175/en', 'Privacy_policy/176/en', 'Privacy_policy/177/en') )\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: translate_reviews type: index possible_keys: NULL key: PRIMARY key_len: 12 ref: NULL rows: 194094 Extra: Using index *************************** 2. row *************************** id: 1 select_type: SIMPLE table: page type: eq_ref possible_keys: PRIMARY,name_title key: PRIMARY key_len: 4 ref: metawiki.translate_reviews.trr_page rows: 1 Extra: Using where 2 rows in set (0.03 sec)
Finally leading to: https://grafana.wikimedia.org/dashboard/db/mysql-aggregated?orgId=1&from=1526994927108&to=1526998716796
With the comment above mentioning metawiki and Privacy_policy I can't help but speculate that the edit at 13:28 or translation log entries at 13:26 and 13:32 might have had something to do with it...
13:32, 22 May 2018 Kaganer (talk | contribs) changed the state of Russian translations of Privacy policy from In progress to Needs updating
13:28, 22 May 2018 Kaganer (talk | contribs) m . . (52,758 bytes) (+3) . . (thank) - https://meta.wikimedia.org/w/index.php?title=Privacy_policy&diff=18066512&oldid=18063747
13:26, 22 May 2018 FuzzyBot (talk | contribs) changed the state of Russian translations of Privacy policy from Needs updating to In progress
Also:
https://meta.wikimedia.org/w/index.php?title=Special:Log&page=Privacy+policy&uselang=en
13:29, 22 May 2018 Kaganer (talk | contribs) marked Privacy policy for translation
It is not clear to me what could have caused such a high number of simultaneous queries for MessageCollection::loadReviewInfo, but it is easy to understand that if they start piling up it goes very fast downhill from there.
TranslationsUpdateJob does:
MessageGroupStats::clearGroup( $id ); MessageGroupStats::forGroup( $id );
I am not 100% sure how transactions work in the job queue, but if these two are not executed inside the same transaction, there could be a brief moment where stats are missing and many simultaneous page views to Privacy_policy could trigger a bunch of MessageCollection::loadReviewInfo before any one of them completes and the result is cached for subsequent edits.
Other possibility is that someone would have been spamming Special:Translate that does not use a cache, or someone would have been explicitly purging the translation stats cache for the page (possibly many times).
Looking at logstash, I see these queries from a handful of IPs and some without IP/name (JobQueue I assume). It doesn't have stacktraces or request URLs in those slowquery log entries, but the DBPerformance has and it shows a lot of direct page views or renderings of the page through API. Querying for a particular ip I see A LOT of varnishslowlog entries... it seems possible that some kind of bot (or even multiple of them) have been hitting that page hard.
Anyway, Jaime has correctly identified that the query is not using a index. translate_reviews has an index on (trr_user, trr_page, trr_revision) but that particular query is only specifying trr_page and trr_revision. Doing a grep inside Translate, I only found one other place where this table is accessed: ApiTranslationReview, which does an insert with all values provided. This leads me to think that we should swap the order of columns in the unique index to (trr_page, trr_revision, trr_user) to make it usable for the query in MessageCollection.
@jcrespo @Marostegui Do you agree with this conclusion?
Also, does anyone think that fixing the index is not sufficient, and that MessageGroupStats should use, for example, PoolCounter to limit the number of concurrent queries?
It appears to me in the desktop view put not in mobile version although in mobile version i just can explore pages but can't edit!
@1339861mzb Are you sure this is related? Please read https://www.mediawiki.org/wiki/How_to_report_a_bug and fill a new task. This seems to be unrelated.
I am ok with any index change you want, just create the ticket. Independently of that, you should make sure to avoid any cache stampede issues, maybe poolcounter could help with that?
You responded shortly before I left for vacation, and it seems people have been waiting for me to come back to fix this. I will start working on this by adding the missing index and exploring the other measures discussed.
Change 450032 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[mediawiki/extensions/Translate@master] Optimize index on translate_reviews
While testing this index patch, I noticed I had locally created a custom index for translatewiki.net which I had forgotten. *scolds the past himself*
Change 450032 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Optimize index on translate_reviews
I consider that sufficient work has been done to close this task. I propose the follow-up work to avoid stampedes to be done in a separate task (T134252: Avoid "Lock wait timeout exceeded" from MessageGroupStats::clear seems suitable) and to be scheduled independently.