Page MenuHomePhabricator

Redis server used by OCG (deployment-redis01) is DOWN in beta cluster
Closed, DuplicatePublic

Description

After upgrade of OCG to version 64050af0456a43344b32e3e93561a79207565eaf, "download as PDF" says there is an error, and the logs in /var/log/ocg/ocg.log say:

Jan 26 16:59:23 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5256,"level":50,"err":{"message":"Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED","name":"Error","stack":"Error: Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED\n    at RedisClient.on_error (/srv/deployment/ocg/ocg/node_modules/redis/index.js:196:24)\n    at Socket.<anonymous> (/srv/deployment/ocg/ocg/node_modules/redis/index.js:106:14)\n    at Socket.EventEmitter.emit (events.js:95:17)\n    at net.js:441:14\n    at process._tickCallback (node.js:415:13)"},"msg":"Redis connection (now disconnected) had error.","time":"2016-01-26T16:59:23.408Z","v":0}
Jan 26 16:59:23 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5251,"level":50,"err":{"message":"Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED","name":"Error","stack":"Error: Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED\n    at RedisClient.on_error (/srv/deployment/ocg/ocg/node_modules/redis/index.js:196:24)\n    at Socket.<anonymous> (/srv/deployment/ocg/ocg/node_modules/redis/index.js:106:14)\n    at Socket.EventEmitter.emit (events.js:95:17)\n    at net.js:441:14\n    at process._tickCallback (node.js:415:13)"},"msg":"Redis connection (now disconnected) had error.","time":"2016-01-26T16:59:23.529Z","v":0}
Jan 26 16:59:23 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5249,"level":50,"err":{"message":"Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED","name":"Error","stack":"Error: Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED\n    at RedisClient.on_error (/srv/deployment/ocg/ocg/node_modules/redis/index.js:196:24)\n    at Socket.<anonymous> (/srv/deployment/ocg/ocg/node_modules/redis/index.js:106:14)\n    at Socket.EventEmitter.emit (events.js:95:17)\n    at net.js:441:14\n    at process._tickCallback (node.js:415:13)"},"msg":"Redis connection (now disconnected) had error.","time":"2016-01-26T16:59:23.537Z","v":0}
Jan 26 16:59:23 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5252,"level":50,"err":{"message":"Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED","name":"Error","stack":"Error: Redis connection to deployment-redis01.eqiad.wmflabs:6379 failed - connect ECONNREFUSED\n    at RedisClient.on_error (/srv/deployment/ocg/ocg/node_modules/redis/index.js:196:24)\n    at Socket.<anonymous> (/srv/deployment/ocg/ocg/node_modules/redis/index.js:106:14)\n    at Socket.EventEmitter.emit (events.js:95:17)\n    at net.js:441:14\n    at process._tickCallback (node.js:415:13)"},"msg":"Redis connection (now disconnected) had error.","time":"2016-01-26T16:59:23.573Z","v":0}
Jan 26 16:59:31 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5251,"level":30,"channel":"frontend.request","request":{"id":"1453827571014-26719","clientip":"127.0.0.1","version":"1.1","method":"GET","url":"/?command=health"},"msg":"Starting request 1453827571014-26719 for 127.0.0.1, HTTP 1.1 GET /?command=health","time":"2016-01-26T16:59:31.015Z","v":0}
Jan 26 16:59:31 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5251,"level":20,"channel":"frontend.request.details","request":{"id":"1453827571014-26719","params":{"command":"health"}},"msg":"Request details for 1453827571014-26719","time":"2016-01-26T16:59:31.016Z","v":0}
Jan 26 16:59:31 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5251,"level":50,"channel":"frontend.error","err":{"message":"send_command: stream not writeable. enable_offline_queue is false","name":"Error","stack":"Error: send_command: stream not writeable. enable_offline_queue is false\n    at RedisClient.send_command (/srv/deployment/ocg/ocg/node_modules/redis/index.js:816:39)\n    at RedisClient.(anonymous function) (/srv/deployment/ocg/ocg/node_modules/redis/index.js:991:25)\n    at /srv/deployment/ocg/ocg/node_modules/prfun/lib/index.js:587:34\n    at PrFunPromise.Promise (/srv/deployment/ocg/ocg/node_modules/core-js/modules/es6.promise.js:182:7)\n    at new PrFunPromise (/srv/deployment/ocg/ocg/node_modules/prfun/lib/index.js:57:21)\n    at /srv/deployment/ocg/ocg/node_modules/prfun/lib/index.js:577:16\n    at RedisWrapper.llen (/srv/deployment/ocg/ocg/mw-ocg-service/lib/RedisWrapper.js:132:64)\n    at Promise.resolve.then.then.then.responseAry.JobQueue.name (/srv/deployment/ocg/ocg/mw-ocg-service/lib/threads/frontend.js:404:22)\n    at run (/srv/deployment/ocg/ocg/node_modules/core-js/modules/es6.promise.js:89:39)\n    at /srv/deployment/ocg/ocg/node_modules/core-js/modules/es6.promise.js:100:28"},"request":{"id":"1453827571014-26719"},"msg":"Health failed with error: Error: send_command: stream not writeable. enable_offline_queue is false","time":"2016-01-26T16:59:31.021Z","v":0}
Jan 26 16:59:31 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5251,"level":50,"request":{"id":"1453827571014-26719"},"err":{"message":"Status check failed (redis failure?)","name":"FrontendError","stack":"FrontendError: Status check failed (redis failure?)\n    at /srv/deployment/ocg/ocg/mw-ocg-service/lib/threads/frontend.js:438:9\n    at run (/srv/deployment/ocg/ocg/node_modules/core-js/modules/es6.promise.js:89:39)\n    at /srv/deployment/ocg/ocg/node_modules/core-js/modules/es6.promise.js:100:28\n    at process._tickCallback (node.js:415:13)","code":500},"msg":"Status check failed (redis failure?)","time":"2016-01-26T16:59:31.023Z","v":0}

Event Timeline

cscott created this task.Jan 26 2016, 5:00 PM
cscott raised the priority of this task from to High.
cscott updated the task description. (Show Details)
cscott added subscribers: cscott, thcipriani.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 26 2016, 5:00 PM

There's a suspicious "stream not writeable" in there which makes me suspicious that this might be related to T124720.

But on the other hand, ECONNREFUSED makes me think that deployment-redis01.eqiad.wmflabs is just down?

The redis host is set by a puppetized configuration file which lives in /etc/ocg/mw-ocg-service.js on the deployment-pdf0x host.

cscott renamed this task from OCG redis server is readonly in beta cluster to OCG redis server is DOWN in beta cluster.Jan 26 2016, 5:11 PM
cscott set Security to None.

from deployment-pdf01:

$ redis-cli -h deployment-redis01.eqiad.wmflabs
Could not connect to Redis at deployment-redis01.eqiad.wmflabs:6379: Connection refused

That seems to be the root cause. I couldn't get console output from https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&instanceid=fb5507a9-6488-47b8-9737-ed739f8faff5&project=deployment-prep&region=eqiad either.

cscott renamed this task from OCG redis server is DOWN in beta cluster to Redis server used by OCG (deployment-redis01) is DOWN in beta cluster.Jan 26 2016, 5:15 PM
thcipriani closed this task as Resolved.Jan 26 2016, 6:14 PM
thcipriani claimed this task.

Not sure what changed, but it looks like this is now fixed?

/var/log/ocg/ocg.log output:

Jan 26 18:10:26 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5256,"level":20,"msg":"Redis connection now ready","time":"2016-01-26T18
:10:26.863Z","v":0}
Jan 26 18:10:26 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5252,"level":20,"msg":"Redis connection now ready","time":"2016-01-26T18
:10:26.869Z","v":0}
Jan 26 18:10:26 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5251,"level":20,"msg":"Redis connection now ready","time":"2016-01-26T18
:10:26.869Z","v":0}
Jan 26 18:10:26 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5253,"level":20,"msg":"Redis connection now ready","time":"2016-01-26T18
:10:26.903Z","v":0}
Jan 26 18:10:27 deployment-pdf01 mw-ocg-service: {"name":"mw-ocg-service","hostname":"deployment-pdf01","pid":5249,"level":20,"msg":"Redis connection now ready","time":"2016-01-26T18
:10:27.249Z","v":0}
thcipriani@deployment-pdf01:~$ redis-cli -h deployment-redis01.eqiad.wmflabs
deployment-redis01.eqiad.wmflabs:6379>

Please reopen if that's not the case.

Yeah, seems to work now. Thanks!

hashar added a subscriber: hashar.Jan 26 2016, 8:49 PM

deployment-redis01 is running Trusty which ship a version not supporting latency monitoring. Ends up not starting properly :( See T124677