SULF renameuser jobs slow on mediawikiwiki and queries timing out
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Legoktm
	Apr 18 2015, 8:48 PM

Description

I'm not really sure what's going on, but some renameuser queries are timing out on mediawikiwiki, causing the rename jobs to fail (they typically succeed the next time the job queue retries them). I haven't seen this on any other wiki yet. Overall they're also running slower compared to other wikis.

Examples:

2015-04-18 19:46:58 mw1006 mediawikiwiki: [e55535e1] /rpc/RunJobs.php?wiki=media
wikiwiki&type=LocalRenameUserJob&maxtime=60&maxmem=300M   DBQueryError from line
 1246 of /srv/mediawiki/php-1.26wmf2/includes/db/Database.php: A database error 
has occurred. Did you forget to run maintenance/update.php after upgrading?  See
: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: UPDATE  `filearchive` SET fa_user_text = 'John Morawski~mediawikiwiki' WH
ERE fa_user_text = 'John Morawski' AND fa_user = '12243'
Function: RenameuserSQL::rename
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.27)

2015-04-18 19:46:42 mw1015 mediawikiwiki: [32b6040b] /rpc/RunJobs.php?wiki=media
wikiwiki&type=LocalRenameUserJob&maxtime=60&maxmem=300M   DBQueryError from line
 1246 of /srv/mediawiki/php-1.26wmf2/includes/db/Database.php: A database error 
has occurred. Did you forget to run maintenance/update.php after upgrading?  See
: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: UPDATE  `filearchive` SET fa_user_text = 'Jodie~mediawikiwiki' WHERE fa_u
ser_text = 'Jodie' AND fa_user = '14621'
Function: RenameuserSQL::rename
Error: 2013 Lost connection to MySQL server during query (10.64.16.27)

2015-04-18 19:47:41 mw1011 mediawikiwiki: [ae171a63] /rpc/RunJobs.php?wiki=media
wikiwiki&type=LocalRenameUserJob&maxtime=60&maxmem=300M   DBQueryError from line
 1246 of /srv/mediawiki/php-1.26wmf2/includes/db/Database.php: A database error 
has occurred. Did you forget to run maintenance/update.php after upgrading?  See
: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: UPDATE  `logging` SET log_title = 'Johny~mediawikiwiki' WHERE log_type IN ('block','newusers','rights','renameuser','thanks')  AND log_namespace = '2' AND log_title = 'Johny'
Function: RenameuserSQL::rename
Error: 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.27)

I can probably find more examples if that is helpful.

Details

	Subject	Repo	Branch	Lines +/-
	Use DB_SLAVE in CentralAuthUser::importLocalNames()	mediawiki/extensions/CentralAuth	master	+1 -1

Customize query in gerrit

Related Objects

Mentioned In: rECAU008ec099738f: Use DB_SLAVE in CentralAuthUser::importLocalNames()
rMEXTb4a3a805dfc9: Updated mediawiki/extensions Project: mediawiki/extensions/CentralAuth…

Event Timeline

Legoktm created this task.Apr 18 2015, 8:48 PM

Legoktm raised the priority of this task from to High.

Legoktm updated the task description. (Show Details)

Legoktm added projects: SUL-Finalization, MediaWiki-extensions-CentralAuth.

Legoktm added subscribers: Legoktm, aaron, • Springle and 3 others.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 18 2015, 8:48 PM

Legoktm added a project: Reading-Infrastructure-Team-Old (Don't use).Apr 18 2015, 8:48 PM

Legoktm set Security to None.

Change 205070 had a related patch set uploaded (by Legoktm):
Use DB_SLAVE in CentralAuthUser::importLocalNames()

https://gerrit.wikimedia.org/r/205070

gerritbot added a project: Patch-For-Review.Apr 19 2015, 1:47 AM

From #wikimedia-operations:

[18:26:14] <springle> legoktm: 22K queries/sec on S3 master atm. Not surprised that you saw some lock wait timeouts earlier :)
[18:26:51] <legoktm> springle: uhoh :/ should I slow down?
[18:27:10] <legoktm> right now I'm on arwiki which is s7
[18:27:47] <springle> these are from your jobs? : SELECT /* CentralAuthUser::importLocalNames
[18:28:46] <legoktm> yes
[18:29:40] <springle> if we can throttle that select load back a bit it would help S3 db1038
[18:31:18] <legoktm> are those queries still showing up on s3? I shouldn't be running anything on that db anymore...
[18:32:51] <legoktm> the centralauth db is on db1033 right? not 1038?
[18:33:02] <springle> they are, yes. spiked for a few hours earlier. paused a while, then spiked again 30m ago
[18:33:07] <springle> correct
[18:34:25] <springle> all from jobrunners
[18:36:27] <legoktm> lowering the limit from 85 concurrent renames to 50
[18:38:02] <legoktm> ok, this code is doing a query on every single wiki to see if the user exists :/
[18:38:17] <springle> S3 master load dropping back
[18:38:24] <springle> happier now
[18:38:58] <springle> probably wouldn't have noticed if that SELECT traffic had used a slave
[18:39:41] <legoktm> so the old rate would have been fine if it were using a slave?
[18:39:58] <springle> yep. probably also only an issue for S3

I've seen these lock waits on prior runs. They are probably for users with no actual deleted edits, so the gap endpoints are some adjacent user rows that do. So if similarly named users are being processed, the jobs for them will serialize with INSERT queries.

Even if they did have rows, if it's not a unique index query it would like the adjacent gaps anyway of course.

Anyway, the real problem is that lots of users may have no deleted edits, making the gaps fairly wide I suspect.

So should https://gerrit.wikimedia.org/r/#/c/205070/ get reviewed/merged?
Or does this task slowly get obsolete, as SUL-Finalization is nearly finished?

The majority of renaming on mw.org is done now, so the impact won't be huge. If it's a useful fix in general, we should merge it, but we can wait for @Legoktm to get back from vacation.

Change 205070 merged by jenkins-bot:
Use DB_SLAVE in CentralAuthUser::importLocalNames()

https://gerrit.wikimedia.org/r/205070

Diffusion mentioned this in rMEXTb4a3a805dfc9: Updated mediawiki/extensions Project: mediawiki/extensions/CentralAuth….Apr 25 2015, 5:06 PM

Legoktm mentioned this in rECAU008ec099738f: Use DB_SLAVE in CentralAuthUser::importLocalNames().Apr 25 2015, 5:09 PM

Legoktm closed this task as Resolved.Apr 27 2015, 6:23 PM

Legoktm claimed this task.

Syncretic moved this task from It's complicated to Done on the SUL-Finalization board.Apr 28 2015, 1:56 PM

Anomie moved this task from Backlog to Done on the Reading-Infrastructure-Team-Old (Don't use) board.Apr 28 2015, 4:24 PM

MarcoAurelio moved this task from Backlog to Done on the MediaWiki-extensions-CentralAuth board.May 5 2015, 5:46 PM