Page MenuHomePhabricator

Some edits to MediaWiki namespace time out on translatewiki.net
Closed, ResolvedPublic1 Estimated Story Points

Description

All edits to MediaWIki namespaces are timing out on translatewiki.net. From some manual testing it seems that https://gerrit.wikimedia.org/r/#/c/230967/ is the cause, though I do not know why. Reverting to 3a252efa12253e77a07a75da25bcf558c5bcd209 stops the timeouts.

In addition, I am also seeing constant queries for re-loading message cache and this message in the debug log:

[MessageCache] MessageCache::load: Loading en... global cache is expired/volatile, loading from database

Event Timeline

Nikerabbit raised the priority of this task from to Unbreak Now!.
Nikerabbit updated the task description. (Show Details)
Nikerabbit added subscribers: Nikerabbit, ori, aaron.

Change 231856 had a related patch set uploaded (by Aaron Schulz):
Always call setValidationHash() in MessageCache::saveToCaches()

https://gerrit.wikimedia.org/r/231856

Change 231856 merged by jenkins-bot:
Always call setValidationHash() in MessageCache::saveToCaches()

https://gerrit.wikimedia.org/r/231856

Even with the patch there are still occasional reports of saves timing out, such as T109233 and one case experienced my self. I wonder if the recent addition of sorting the list of translations is a problem.

Even with the patch there are still occasional reports of saves timing out, such as T109233 and one case experienced my self. I wonder if the recent addition of sorting the list of translations is a problem.

How many MW: message pages are on TWN?

Way too many, but I have the option set to only cache the ones that are customized for wiki content language and their translations. In any case I will do some profiling with xhprof tomorrow to see if it gives any hints.

According to profiling MessageCache::replace takes 9 seconds and MessageCache::lock almost 3. Perhaps there is queue for the lock (is sleep time counted in xhprof?) because something else in replace is slow now.

It seems lock is called three times as often as replace, and if each lock takes about three seconds, that would explain why replace takes about 9 seconds.

The translations are actually getting saved, nginx will just stop waiting for the response after 15 seconds.

Will continue debugging tomorrow or (more likely) Wednesday.

Nikerabbit renamed this task from All edits to MediaWiki namespace time out on translatewiki.net to Some edits to MediaWiki namespace time out on translatewiki.net.Aug 17 2015, 7:24 PM
Nikerabbit set Security to None.

Possibly related:

[Tue Aug 18 16:22:52 2015] [hphp] [2901:7fd1323ff700:14093:000001] [] SlowTimer [51201ms] at runtime/ext_mysql: slow query: UPDATE /* MessageBlobStore::updateMessage Amire80 */  `bw_msg_resource` SET mr_blob = '{\\"twnmp-signup-error-invalidemail\\":\\"This does not look like a valid email address. Please, provide a valid one.\\",\\"twnmp-signup-error-invalidpassword\\":\\"The password is invalid.\\",\\"twnmp-signup-error-invalidusername\\":\\"This username is not available. Please, pick a different one.\\",\\"twnmp-signup-error-nonfreeusername\\":\\"This username already exists. Please, pick a different one.\\",\\"twnmp-signup-error-other\\":\\"Signing up failed. The server returned an error: $1.\\",\\"twnmp-signup-error-unknown\\":\\"Account creation failed for an unknown reason.\\"}',mr_timestamp = '20150818142201' WHERE mr_resource = 'ext.translate.mainpage.signup' AND mr_lang = 'ar' AND mr_timestamp = '20150818142130'
[Tue Aug 18 16:22:52 2015] [hphp] [2901:7fd0c43ff700:8211:000001] [] SlowTimer [51095ms] at runtime/ext_mysql: slow query: UPDATE /* MessageBlobStore::updateMessage Amire80 */  `bw_msg_resource` SET mr_blob = '{\\"twnmp-signup-error-invalidemail\\":\\"This does not look like a valid email address. Please, provide a valid one.\\",\\"twnmp-signup-error-invalidpassword\\":\\"The password is invalid.\\",\\"twnmp-signup-error-invalidusername\\":\\"This username is not available. Please, pick a different one.\\",\\"twnmp-signup-error-nonfreeusername\\":\\"This username already exists. Please, pick a different one.\\",\\"twnmp-signup-error-other\\":\\"Signing up failed. The server returned an error: $1.\\",\\"twnmp-signup-error-unknown\\":\\"Account creation failed for an unknown reason.\\"}',mr_timestamp = '20150818142201' WHERE mr_resource = 'ext.translate.mainpage.signup' AND mr_lang = 'ar' AND mr_timestamp = '20150818142130'
[Tue Aug 18 16:22:53 2015] [hphp] [2901:7fd0f33ff700:13808:000001] [] SlowTimer [51447ms] at runtime/ext_mysql: slow query: UPDATE /* MessageBlobStore::updateMessage Amire80 */  `bw_msg_resource` SET mr_blob = '{\\"twnmp-signup-error-invalidemail\\":\\"This does not look like a valid email address. Please, provide a valid one.\\",\\"twnmp-signup-error-invalidpassword\\":\\"The password is invalid.\\",\\"twnmp-signup-error-invalidusername\\":\\"This username is not available. Please, pick a different one.\\",\\"twnmp-signup-error-nonfreeusername\\":\\"This username already exists. Please, pick a different one.\\",\\"twnmp-signup-error-other\\":\\"Signing up failed. The server returned an error: $1.\\",\\"twnmp-signup-error-unknown\\":\\"Account creation failed for an unknown reason.\\"}',mr_timestamp = '20150818142202' WHERE mr_resource = 'ext.translate.mainpage.signup' AND mr_lang = 'ar' AND mr_timestamp = '20150816144158'
[Tue Aug 18 16:23:26 2015] [hphp] [2901:7fd1197ff700:14125:000001] [] SlowTimer [51847ms] at runtime/ext_mysql: slow query: UPDATE /* MessageBlobStore::updateMessage Amire80 */  `bw_msg_resource` SET mr_blob = '{\\"twnmp-signup-error-invalidemail\\":\\"This does not look like a valid email address. Please, provide a valid one.\\",\\"twnmp-signup-error-invalidpassword\\":\\"The password is invalid.\\",\\"twnmp-signup-error-invalidusername\\":\\"This username is not available. Please, pick a different one.\\",\\"twnmp-signup-error-nonfreeusername\\":\\"This username already exists. Please, pick a different one.\\",\\"twnmp-signup-error-other\\":\\"Signing up failed. The server returned an error: $1.\\",\\"twnmp-signup-error-unknown\\":\\"Account creation failed for an unknown reason.\\"}',mr_timestamp = '20150818142235' WHERE mr_resource = 'ext.translate.mainpage.signup' AND mr_lang = 'ar' AND mr_timestamp = '20150818142130'

This is also slowing down parserTests runs.

Each parsed !! article executes doEditUpdates which, if the namespace is right, calls MessageCache::singleton()->replace( $shortTitle, $msgtext );. replace retains a lock before calling load, which now, because of the addition of } elseif ( $hashVolatile ) { in that commit, calls loadFromDBWithLock. Then we wait until the lock times out.

Change 232536 had a related patch set uploaded (by Arlolra):
T109183: We already have a lock

https://gerrit.wikimedia.org/r/232536

Change 232661 had a related patch set uploaded (by Aaron Schulz):
[WIP] Added $wgDataCenterId/$wgDataCenterRoles

https://gerrit.wikimedia.org/r/232661

Change 232536 had a related patch set uploaded (by Nemo bis):
We already have a lock

https://gerrit.wikimedia.org/r/232536

Change 233063 had a related patch set uploaded (by Aaron Schulz):
Avoid self-deadlocks in MessageCache::replace()

https://gerrit.wikimedia.org/r/233063

Change 233063 merged by jenkins-bot:
Avoid self-deadlocks in MessageCache::replace()

https://gerrit.wikimedia.org/r/233063

Change 232536 abandoned by Arlolra:
We already have a lock

Reason:
Fixed in I305f51e744aac53876e5865f860c282aa2efbd8b

https://gerrit.wikimedia.org/r/232536

Arrbee lowered the priority of this task from Unbreak Now! to Medium.Aug 26 2015, 11:44 AM
Arrbee moved this task from Backlog to In Progress on the LE-CX6-Sprint 3 board.
Arrbee subscribed.
Nikerabbit moved this task from In Progress to Done on the LE-CX6-Sprint 3 board.
Nikerabbit edited a custom field.

Fixed in master as far as I can see.