Page MenuHomePhabricator

beta cluster LocalGlobalUserPageCacheUpdateJob attempt to run on db that are not on beta cluster
Closed, ResolvedPublic

Description

While looking at T100694 , deployment-videoscaler01 has a /var/log/mediawiki/jobrunner.log filled with errors like:

2015-05-29T13:23:21+0000: Runner loop 0 process in slot 0 gave status '255':
nice -19 php /srv/mediawiki/multiversion/MWScript.php runJobs.php --wiki='nnwiki' --type='LocalGlobalUserPageCacheUpdateJob' --maxtime='60' --memory-limit='300M' --result=json
	/srv/mediawiki/wikiversions-labs.cdb has no version entry for `nnwiki`.

Fatal error: /srv/mediawiki/wikiversions-labs.cdb has no version entry for `nnwiki`.
 in /srv/mediawiki/multiversion/MWMultiVersion.php on line 369

Sees the LocalGlobalUserPageCacheUpdateJob uses the all.dblist instead of all-labs.dblist.

Event Timeline

On deployment-jobrunner01.eqiad.wmflabs we have:

2015-05-29T15:51:21+0000: Runner loop 0 process in slot 3 gave status '0':
curl -XPOST -s -a 'http://127.0.0.1:9005/rpc/RunJobs.php?wiki=mswiki&type=LocalGlobalUserPageCacheUpdateJob&maxtime=60&maxmem=300M'
...
>Sorry, we were not able to work out what wiki you were trying to view.
			Please specify a valid Host header.</
...

So some jobs are inserted to be run on databases that do not exist :-/

I tried manually deleting job entries on deployment-redis01.eqiad.wmflabs under jobqueue:aggregator:h-ready-queues:v2 queue:

$ redis-cli
127.0.0.1:6379> AUTH ....
OK
127.0.0.1:6379> HGETALL jobqueue:aggregator:h-ready-queues:v2
 1) "LocalGlobalUserPageCacheUpdateJob/svwiki"
 2) "1433927930"
 3) "LocalGlobalUserPageCacheUpdateJob/minwiki"
 4) "1433928627"
 5) "LocalGlobalUserPageCacheUpdateJob/nowiki"
 6) "1433928690"
 7) "LocalGlobalUserPageCacheUpdateJob/idwiki"
 8) "1433928749"
 9) "webVideoTranscode/dewiki"
10) "1433928550"
11) "webVideoTranscode/wikidatawiki"
12) "1433928225"
13) "webVideoTranscode/enwiki"
14) "1433928464"
15) "LocalGlobalUserPageCacheUpdateJob/uzwiki"
16) "1433928268"
17) "LocalGlobalUserPageCacheUpdateJob/nnwiki"
18) "1433927765"
19) "webVideoTranscode/commonswiki"
20) "1433928195"
21) "LocalGlobalUserPageCacheUpdateJob/mswiki"
22) "1433928523"
23) "LocalGlobalUserPageCacheUpdateJob/dawiki"
24) "1433928619"
25) "gwtoolsetUploadMetadataJob/commonswiki"
26) "1433928755"

Then:

127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/nnwiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/idwiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/dawiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/minwiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/svwiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/uzwiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/mswiki
(integer) 1
127.0.0.1:6379> HDEL jobqueue:aggregator:h-ready-queues:v2 LocalGlobalUserPageCacheUpdateJob/nowiki
(integer) 1

That stopped the spam of fatal errors in /var/log/mediawiki/jobrunner.log but the entries eventually get reinserted :-/

The faulty wikis are part of the list of wikis in jobqueue:aggregator:s-wikis:v2 i.e.:

127.0.0.1:6379> SISMEMBER jobqueue:aggregator:s-wikis:v2 nnwiki
(integer) 1

Doing a SMEMBERS jobqueue:aggregator:s-wikis:v2 there is a ton of chron-testwiki-##### entries and the wiki that should not be in there. Not sure how to resync that list though.

I have deleted all entries from jobqueue:aggregator:s-wikis:v2 :/ There is only enwiki left now. commonswiki eventually appeared:

127.0.0.1:6379> SMEMBERS jobqueue:aggregator:s-wikis:v2
1) "commonswiki"
2) "enwiki"

I have added all the wikis from all-labs.dblist beside enwiki and commonswiki which were already there.

127.0.0.1:6379> SADD jobqueue:aggregator:s-wikis:v2 aawiki arwiki cawiki deploymentwiki dewiki ee_prototypewiki en_rtlwiki enwikibooks enwikinews enwikiquote enwikisource enwikiversity enwiktionary eowiki eswiki fawiki hewiki hiwiki jawiki kowiki loginwiki metawiki ruwiki simplewiki sqwiki testwiki ukwiki wikidatawiki zerowiki zhwiki
(integer) 30

There is no spam anymore but I might have broken the job running service entirely :-(((

hashar claimed this task.