Page MenuHomePhabricator

Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs."
Closed, DuplicatePublic

Description

Internal error:
[06b26103] /w/index.php?title=Special:UserLogin&action=submitlogin&type=signup&returnto=Main+Page JobQueueConnectionError from line 753 of /srv/mediawiki/php-master/includes/jobqueue/JobQueueRedis.php: Unable to connect to redis server deployment-redis01.eqiad.wmflabs.

Backtrace:

#0 /srv/mediawiki/php-master/includes/jobqueue/JobQueueRedis.php(210): JobQueueRedis->getConnection()
#1 /srv/mediawiki/php-master/includes/jobqueue/JobQueue.php(324): JobQueueRedis->doBatchPush(array, integer)
#2 /srv/mediawiki/php-master/includes/jobqueue/JobQueue.php(296): JobQueue->batchPush(array, integer)
#3 /srv/mediawiki/php-master/includes/jobqueue/JobQueueGroup.php(129): JobQueue->push(array)
#4 /srv/mediawiki/php-master/extensions/CentralAuth/includes/CentralAuthPlugin.php(390): JobQueueGroup->push(array)
#5 /srv/mediawiki/php-master/extensions/CentralAuth/includes/CentralAuthPlugin.php(309): CentralAuthPlugin->autoCreateAccounts(CentralAuthUser)
#6 /srv/mediawiki/php-master/includes/specials/SpecialUserlogin.php(665): CentralAuthPlugin->addUser(User, string, string, string)
#7 /srv/mediawiki/php-master/includes/specials/SpecialUserlogin.php(421): LoginForm->addNewAccountInternal()
#8 /srv/mediawiki/php-master/includes/specials/SpecialUserlogin.php(353): LoginForm->addNewAccount()
#9 /srv/mediawiki/php-master/includes/specialpage/SpecialPage.php(384): LoginForm->execute(NULL)
#10 /srv/mediawiki/php-master/includes/specialpage/SpecialPageFactory.php(564): SpecialPage->run(NULL)
#11 /srv/mediawiki/php-master/includes/MediaWiki.php(280): SpecialPageFactory::executePath(Title, RequestContext)
#12 /srv/mediawiki/php-master/includes/MediaWiki.php(745): MediaWiki->performRequest()
#13 /srv/mediawiki/php-master/includes/MediaWiki.php(517): MediaWiki->main()
#14 /srv/mediawiki/php-master/index.php(43): MediaWiki->run()
#15 /srv/mediawiki/w/index.php(3): include(string)
#16 {main}

Event Timeline

Bugreporter raised the priority of this task from to Unbreak Now!.
Bugreporter updated the task description. (Show Details)
Bugreporter subscribed.
Aklapper renamed this task from Can not create account at beta cluster to Can not create account at beta cluster: "Unable to connect to redis server deployment-redis01.eqiad.wmflabs.".Jan 22 2016, 8:12 AM
Aklapper set Security to None.

Thu Jan 21 19:24:12 2016] init: redis-instance-tcp_6380 main process (2135) terminated with status 1
[Thu Jan 21 19:24:12 2016] init: redis-instance-tcp_6380 main process ended, respawning
[Thu Jan 21 19:24:13 2016] init: redis-instance-tcp_6381 main process (2162) terminated with status 1
[Thu Jan 21 19:24:13 2016] init: redis-instance-tcp_6381 main process ended, respawning
[Thu Jan 21 19:24:14 2016] init: redis-instance-tcp_6379 main process (2174) terminated with status 1
[Thu Jan 21 19:24:14 2016] init: redis-instance-tcp_6379 main process ended, respawning
[Thu Jan 21 19:24:20 2016] init: redis-instance-tcp_6378 main process (2410) terminated with status 1
[Thu Jan 21 19:24:20 2016] init: redis-instance-tcp_6378 main process ended, respawning

And it is dead:

root@deployment-redis01:/var/log# /etc/init.d/redis-server status
redis-server is not running
root@deployment-redis01:/var/log# ps -A|grep redis
root@deployment-redis01:/var/log#

From /var/log/upstart/redis-instance* files which are all at Jan 21 19:24:

*** FATAL CONFIG FILE ERROR ***
Reading the configuration file, at line 549
>>> 'latency-monitor-threshold 100'
Bad directive or wrong number of arguments

Well puppet patch from Dec 29th introduced the 'latency-monitor-threshold' config https://gerrit.wikimedia.org/r/#/c/261303/ ... That is quite old.

hashar claimed this task.

So the redis-server errors above are unrelated maybe.

The services are running:

# initctl list|grep redis
redis-instance-tcp_6379 start/running
redis-instance-tcp_6378 start/running
redis-instance-tcp_6380 start/running
redis-instance-tcp_6381 start/running

They got killed at Jan 21 19:24:11 for some reason.

MGChecker subscribed.

If I try to delete something right now (and earlier today too) I get a similar error wit the same backtrace, only the first line is different:

[88cf8895] /w/index.php?title=EN-Tests&action=delete JobQueueConnectionError from line 753 of /srv/mediawiki/php-master/includes/jobqueue/JobQueueRedis.php: Unable to connect to redis server deployment-redis01.eqiad.wmflabs.

@MGChecker: please try again, we just restarted Redis in Beta Cluster again, see: T124677#1962666

Well puppet patch from Dec 29th introduced the 'latency-monitor-threshold' config https://gerrit.wikimedia.org/r/#/c/261303/ ... That is quite old.

That was actually it, see rOPUP1c16154c6ee76433722d0f01462db57727d37b64 (that same commit, but with @mmodell's commentary).

I'm merging this task into the other one where Ori is assigned to fix the puppet config issue.