I just spotted this in cloudservices1004 logs:
2019-06-04 12:05:31.315 117528 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on cloudcontrol1004.wikimedia.org:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. 2019-06-04 12:05:31.316 117528 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on cloudcontrol1003.wikimedia.org:5672 is unreachable: Too many heartbeats missed. Trying again in 1 seconds. 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher [req-6b8b50b2-2a33-4577-9897-e65d5bfd5174 - - - - -] Exception during message handling 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher Traceback (most recent call last): 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 86, in _dispatch_and_handle_error 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher return self._dispatch(incoming) 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 122, in _dispatch 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher ret = self._exec_callback(callback, filtered_messages) 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 169, in _exec_callback 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher messages[0]["metadata"]) 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 129, in _exec_callback 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher ret = callback(*args) 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/designate/sink/service.py", line 116, in info 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher handler.process_notification(context, event_type, payload) 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/wmf_sink/wmfsink.py", line 83, in process_notification 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher resource_type='instance') 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/wmf_sink/base.py", line 110, in _delete 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher ip = records[0].data 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/designate/objects/base.py", line 570, in __getitem__ 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher return self.objects[index] 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher IndexError: list index out of range 2019-06-04 12:09:53.474 117528 ERROR oslo_messaging.notify.dispatcher 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher [req-6b8b50b2-2a33-4577-9897-e65d5bfd5174 - - - - -] Fail to ack/requeue message 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher Traceback (most recent call last): 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 75, in _post_dispatch 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher m.acknowledge() 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 121, in acknowledge 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher self.message.acknowledge() 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 239, in acknowledge 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher self._raw_message.ack() 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/kombu/message.py", line 88, in ack 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher self.channel.basic_ack(self.delivery_tag) 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 1581, in basic_ack 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher self._send_method((60, 80), args) 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 50, in _send_method 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher raise RecoverableConnectionError('connection already closed') 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher RecoverableConnectionError: connection already closed 2019-06-04 12:09:53.477 117528 ERROR oslo_messaging.notify.dispatcher
The error is repeated many times. We apparently have the correct firewall rules in cloudcontrol1003/1004:
aborrero@cloudcontrol1003:~ $ sudo ip6tables-save -c | grep 2620:0:861:1:208:80:154:11 [0:0] -A INPUT -s 2620:0:861:1:208:80:154:11/128 -p tcp -m tcp --dport 35357 -j ACCEPT [0:0] -A INPUT -s 2620:0:861:1:208:80:154:11/128 -p tcp -m tcp --dport 35357 -j ACCEPT [3:240] -A INPUT -s 2620:0:861:1:208:80:154:11/128 -p tcp -m tcp --dport 5672 -j ACCEPT aborrero@cloudcontrol1003:~ $ sudo iptables-save -c | grep 208.80.154.11 [0:0] -A INPUT -s 208.80.154.11/32 -p tcp -m tcp --dport 35357 -j ACCEPT [0:0] -A INPUT -s 208.80.154.11/32 -p tcp -m tcp --dport 35357 -j ACCEPT [0:0] -A INPUT -s 208.80.154.11/32 -p tcp -m tcp --dport 5672 -j ACCEPT aborrero@cloudcontrol1004:~ $ sudo iptables-save -c | grep 208.80.154.11 [0:0] -A INPUT -s 208.80.154.11/32 -p tcp -m tcp --dport 35357 -j ACCEPT [0:0] -A INPUT -s 208.80.154.11/32 -p tcp -m tcp --dport 35357 -j ACCEPT [0:0] -A INPUT -s 208.80.154.11/32 -p tcp -m tcp --dport 5672 -j ACCEPT aborrero@cloudcontrol1004:~ $ sudo ip6tables-save -c | grep 2620:0:861:1:208:80:154:11 [0:0] -A INPUT -s 2620:0:861:1:208:80:154:11/128 -p tcp -m tcp --dport 35357 -j ACCEPT [0:0] -A INPUT -s 2620:0:861:1:208:80:154:11/128 -p tcp -m tcp --dport 35357 -j ACCEPT [2:160] -A INPUT -s 2620:0:861:1:208:80:154:11/128 -p tcp -m tcp --dport 5672 -j ACCEPT
So I wonder if we have other filters anywhere in the server.