Page MenuHomePhabricator

labcontrol1002 Error: unable to connect to node rabbit@labcontrol1002: nodedown
Closed, ResolvedPublic

Description

I believe this alert is spurious. I suspect this cron is running on the inactive node and causing alarms:

# Puppet Name: drain and log rabbit notifications.error queue
35 * * * * /usr/local/sbin/drain_queue notifications.error >> /var/log/rabbitmq/notifications_error.log 2>&1

Event Timeline

chasemp triaged this task as Medium priority.Dec 18 2017, 2:25 PM
chasemp created this task.
chasemp updated the task description. (Show Details)

Change 398900 had a related patch set uploaded (by Rush; owner: cpettet):
[operations/puppet@production] openstack: only run rabbitmq cleanup on active control node

https://gerrit.wikimedia.org/r/398900

Change 398900 merged by Rush:
[operations/puppet@production] openstack: only run rabbitmq cleanup on active control node

https://gerrit.wikimedia.org/r/398900

chasemp closed this task as Resolved.Jan 2 2018, 2:39 PM
Notice: /Stage[main]/Rabbitmq::Cleanup/Cron[drain and log rabbit notifications.error queue]/ensure: removed
Notice: Finished catalog run in 11.22 seconds
root@labcontrol1002