Bad LocalRenameUserJob stuck in jobrunner for vewikimedia
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	bd808
	Jan 22 2015, 6:18 AM

Description

In exception.log 900 to 1000 times per second:

2015-01-22 06:14:03 mw1001 vewikimedia: [db53b619] /rpc/RunJobs.php?wiki=vewikimedia&type=LocalRenameUserJob&maxtime=60&maxmem=300M MWException from line 366 of /srv/mediawiki/php-1.25wmf14/includes/jobqueue/JobQueue.php: Unrecognized job type 'LocalRenameUserJob'.
#0 /srv/mediawiki/php-1.25wmf14/includes/jobqueue/JobQueueGroup.php(155): JobQueue->pop()
#1 /srv/mediawiki/php-1.25wmf14/includes/jobqueue/JobRunner.php(112): JobQueueGroup->pop()
#2 /srv/mediawiki/rpc/RunJobs.php(42): JobRunner->run()
#3 {main}

Related Objects

Mentioned In: T171371: Investigate 30x increase in Jobrunner errors
Mentioned Here: T57737: Delete vewikimedia and redirect it to wikimedia.org.ve
T87040: GWT infinite loops with wmf's new hhvm job runners
T87264: Remove references to vewikimedia in centralauth database

Event Timeline

bd808 created this task.Jan 22 2015, 6:18 AM

bd808 raised the priority of this task from to Unbreak Now!.

bd808 updated the task description. (Show Details)

bd808 added projects: MediaWiki-Core-Team, SUL-Finalization.

bd808 subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 22 2015, 6:18 AM

[21:35] < legoktm> so, the CentralAuth db had references to a "vewikimedia", which at one point was connected to SUL. It was recently re-opened as a fishbowl, meaning it's not SUL and CA isn't installed. But a user who the db thought existed on vewikimedia was global renamed, and it queued a job on vewikimedia except the job class doesn't exist there

[21:37] < legoktm> legoktm@terbium:~$ mwscript showJobs.php --wiki=vewikimedia --group <-- outputs nothing

The crazy volume of this exception event may be what is killing logstash as well.

I tried this:

$ mwscript eval.php --wiki=vewikimedia
> print_r( JobQueueGroup::singleton()->get('LocalRenameUserJob')->getSize() );
1
> print_r( JobQueueGroup::singleton()->get('LocalRenameUserJob')->delete() );

> print_r( JobQueueGroup::singleton()->get('LocalRenameUserJob')->getSize() );
1

@Tgr found a way to kill jobs from redis for T87040#984282 (path refers to tin), perhaps the same thing could be used here.

Although getSize now returns 0 for me, the exception log is still getting flooded.

Just deleting vewikimedia would solve this problem? It's currently not in use and they requested that it should be a redirect to their current wiki instead. See T57737 and linked bugs.

In T87360#989960, @Krenair wrote:

@Tgr found a way to kill jobs from redis for T87040#984282 (path refers to tin), perhaps the same thing could be used here.

Trying to purge directly via redis commands as used in T87040#984282:

LPOP vewikimedia:jobqueue:LocalRenameUserJob:l-unclaimed
ZREMRANGEBYRANK vewikimedia:jobqueue:LocalRenameUserJob:z-claimed 0 10
ZREMRANGEBYRANK vewikimedia:jobqueue:LocalRenameUserJob:z-abandoned 0 10
ZREMRANGEBYRANK vewikimedia:jobqueue:LocalRenameUserJob:z-delayed 0 10

$ redis-cli -a $PASSWORD -h rdb1003 < redis-vewikimedia-clear.txt
(nil)
(integer) 0
(integer) 0
(integer) 0

Log continues to flood.

@ori was able to help me fix this yesterday. The redis purges I had done removed the job, but the jobrunner instances were still trying to poll the "LocalRenameUserJob/vewikimedia" job queue. This fix for this was to remove the queue from the list of ready queues:

redis-cli -h rdb1001.eqiad.wmnet -a $PASSWORD hdel jobqueue:aggregator:h-ready-queues:v2 LocalRenameUserJob/vewikimedia

bd808 moved this task from Backlog to Done on the MediaWiki-Core-Team board.Jan 23 2015, 5:22 PM

bd808 moved this task from Done to Archive on the MediaWiki-Core-Team board.Feb 5 2015, 1:57 AM

Trying to make the answers here easier to find the next time I'm looking for them by adding the job queue and runner projects.

Keegan moved this task from It's complicated to Done on the SUL-Finalization board.Feb 18 2015, 7:25 PM

bd808 mentioned this in T171371: Investigate 30x increase in Jobrunner errors.Jul 24 2017, 12:59 AM

Bad LocalRenameUserJob stuck in jobrunner for vewikimediaClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Bad LocalRenameUserJob stuck in jobrunner for vewikimedia
Closed, ResolvedPublic
Actions