enwiki master oaiUpdatePage spike
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Springle
	Apr 21 2015, 2:09 PM

Description

Tonight enwiki master db1052 load spiked with many concurrent oaiUpdatePage statements from jobrunners:

REPLACE /* oaiUpdatePage 127.0.0.1 */ INTO updates (up_page,up_action,up_timestamp,up_sequence) VALUES ('0','modify','20150421134316',NULL)

Notice up_page=0. The timestamp steadily increments. The production enwiki slaves do not show excessive lag, but this is likely due to the master throttling commit speed via the semi-synchronous replication plugin. Other slaves not in the semi-sync group (db1047, dbstore*) are showing replag. The REPLACE statements take up to 10s to commit at times as they fight for locks.

Other anecdotal info from IRC:

MaxSem said up_page=0 is related to replag.
legoktm said SUL finalization started on enwiki around that time
I didn't notice SUL traffic going slow, but perhaps this was some cumulative thing

https://gerrit.wikimedia.org/r/#/c/205606/

We should find out how not to fill up binlogs with hordes of identical queries.

Details

Subject	Repo	Branch	Lines +/-
Don't try to update up_page=0 if page moves suppressed redirects	mediawiki/extensions/OAI	wmf/1.26wmf2	+4 -1
Don't try to update up_page=0 if page moves suppressed redirects	mediawiki/extensions/OAI	wmf/1.26wmf1	+4 -1
Don't try to update up_page=0 if page moves suppressed redirects	mediawiki/extensions/OAI	master	+4 -1

Customize query in gerrit

Related Objects

Mentioned In: rEOAIc3bfc8b150ff: Don't try to update up_page=0 if page moves suppressed redirects
rEOAI8771ae5d37b0: Don't try to update up_page=0 if page moves suppressed redirects
rEOAIcef9ca0d0dcc: Don't try to update up_page=0 if page moves suppressed redirects
rMEXTb457d702a12c: Updated mediawiki/extensions Project: mediawiki/extensions/OAI…

Event Timeline

• Springle created this task.Apr 21 2015, 2:09 PM

• Springle raised the priority of this task from to Needs Triage.

• Springle updated the task description. (Show Details)

• Springle added projects: MediaWiki-extensions-OAI, SUL-Finalization.

• Springle added subscribers: • Springle, aaron, • brooke and 2 others.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 21 2015, 2:09 PM

Improved debuggi ng information in https://gerrit.wikimedia.org/r/205606

• Springle triaged this task as High priority.Apr 21 2015, 2:11 PM

• Springle set Security to None.

Deployed MaxSem's debugging info patch, the queries are of "OAIHook::updateMove-to" type which means they're definitely caused by all the SUL finalization page moves.

They're all setting up_page = 0, because we're suppressing redirects....does it actually make sense to update a row for that? The data seems to be useless since no page actually has an id of 0.

Change 205615 had a related patch set uploaded (by Legoktm):
Don't try to update up_page=0 if page moves suppressed redirects

https://gerrit.wikimedia.org/r/205615

gerritbot added a project: Patch-For-Review.Apr 21 2015, 2:49 PM

Change 205615 merged by jenkins-bot:
Don't try to update up_page=0 if page moves suppressed redirects

https://gerrit.wikimedia.org/r/205615

Legoktm mentioned this in rMEXTb457d702a12c: Updated mediawiki/extensions Project: mediawiki/extensions/OAI….Apr 21 2015, 4:23 PM

Legoktm mentioned this in rEOAIcef9ca0d0dcc: Don't try to update up_page=0 if page moves suppressed redirects.

Change 205628 had a related patch set uploaded (by Legoktm):
Don't try to update up_page=0 if page moves suppressed redirects

https://gerrit.wikimedia.org/r/205628

Change 205629 had a related patch set uploaded (by Legoktm):
Don't try to update up_page=0 if page moves suppressed redirects

https://gerrit.wikimedia.org/r/205629

Change 205628 merged by jenkins-bot:
Don't try to update up_page=0 if page moves suppressed redirects

https://gerrit.wikimedia.org/r/205628

Legoktm mentioned this in rEOAI8771ae5d37b0: Don't try to update up_page=0 if page moves suppressed redirects.Apr 21 2015, 4:37 PM

Change 205629 merged by jenkins-bot:
Don't try to update up_page=0 if page moves suppressed redirects

https://gerrit.wikimedia.org/r/205629

Legoktm mentioned this in rEOAIc3bfc8b150ff: Don't try to update up_page=0 if page moves suppressed redirects.Apr 21 2015, 4:37 PM

If db1047 is using single-threaded replication, how is there lock contention on the slaves?

Also semi-sync replication only makes sure the log replication + fsync makes it to another slave, not that the transaction actually applied. So I'm curious how much that helps with slow queries.

@aaron, Sorry, I was unclear: The REPLACE statements take up to 10s to commit on the master...

Semi-sync delays commit on the master until fsync occurs on a slave. This acts like a throttle on master's transaction throughput, forcing clients on the master to wait a little longer for every commit. In this case, where many small transactions appeared surges, at least some of the 10s is due to semi-sync. Without semi-sync, I think more slaves would have behaved like db1047.

I'm not saying semi-sync helps with slow queries, only that there are interesting side-effects that can shift the visible lag from replication onto every client connection on the master.

https://gerrit.wikimedia.org/r/#/c/205629/ fixed the immediate problem.

Legoktm mentioned that this should be discussed further in case the fix is not the right approach for OAI, but apparently it's being killed off...

Bugreporter moved this task from It's complicated to Done on the SUL-Finalization board.Apr 22 2015, 5:02 AM

enwiki master oaiUpdatePage spikeClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

enwiki master oaiUpdatePage spike
Closed, ResolvedPublic
Actions