Paired errors connecting to the job queue redis instances:
Feb 3 07:51:10 mw1226: #012Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out Feb 3 07:51:10 mw1226: #012Warning: Failed connecting to redis server at rdb1001.eqiad.wmnet: Connection timed out
Can be found by search redis on https://logstash.wikimedia.org/
Seems to be load related and/or mostly from job runners. Example:
Unable to connect to redis server rdb1003.eqiad.wmnet:6380.
#0 /srv/mediawiki/php-1.28.0-wmf.18/includes/jobqueue/JobQueueRedis.php(310): JobQueueRedis->getConnection()
#1 /srv/mediawiki/php-1.28.0-wmf.18/includes/jobqueue/JobQueue.php(372): JobQueueRedis->doPop()
#2 /srv/mediawiki/php-1.28.0-wmf.18/includes/jobqueue/JobQueueFederated.php(290): JobQueue->pop()
#3 /srv/mediawiki/php-1.28.0-wmf.18/includes/jobqueue/JobQueue.php(372): JobQueueFederated->doPop()
#4 /srv/mediawiki/php-1.28.0-wmf.18/includes/jobqueue/JobQueueGroup.php(204): JobQueue->pop()
#5 /srv/mediawiki/php-1.28.0-wmf.18/includes/jobqueue/JobRunner.php(160): JobQueueGroup->pop()
#6 /srv/mediawiki/rpc/RunJobs.php(47): JobRunner->run()
#7 {main}4000 messages over an hour :( Seems redis servers are saturated.
From dupe task T130078
Logstash shows up a good spam of messages "Unable to connect to redis server" on https://logstash.wikimedia.org/app/kibana#/dashboard/Redis
The top offenders:
| server:port | 24 hours hits |
|---|---|
| rdb1001.eqiad.wmnet:6380 | 166685 |
| rdb1001.eqiad.wmnet:6379 | 165655 |
| rdb1001.eqiad.wmnet:6381 | 159330 |
| rdb1005.eqiad.wmnet:6380 | 73329 |
| rdb1005.eqiad.wmnet:6381 | 68856 |
| rdb1005.eqiad.wmnet:6379 | 65288 |
| rdb1003.eqiad.wmnet:6379 | 13038 |
| rdb1007.eqiad.wmnet:6379 | 13015 |
| rdb1007.eqiad.wmnet:6380 | 12778 |
| rdb1007.eqiad.wmnet:6381 | 12351 |
Maybe it is due to the currently abnormal of refreshLink jobs being processed. Maybe the Nutcracker proxy is not used or redis instance is being overloaded / has not enough connections.
Also it seems load-related as there is a daily pattern for "unable to connect to redis server"
Debugging
One can run a jobrunner process in foreground mode with debug log send to stdout (example spam P5240 ):
sudo -H -u www-data /usr/bin/php /srv/deployment/jobrunner/jobrunner/redisJobRunnerService --config-file=/tmp/jobrunner-debug.conf --verbose



