Page MenuHomePhabricator

cloudservices1003: gateway timeout error
Closed, ResolvedPublic

Description

There is something weird in cloudservices1003. It seems cloudcontrol1003 sees a 504 from time to time.

aborrero@cloudcontrol1003:~$ sudo journalctl -u designate_floating_ip_ptr_records_updater.service
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: 2019-05-27T10:11:03Z root         WARNING : Error response from http://cloudservices1003.wikimedia.org:9001/v2/zones:
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: {"code": 504, "type": "timeout", "request_id": "req-a3b3ad94-443d-43d1-8c9e-fd909dc86e8b"}
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: Traceback (most recent call last):
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/etc/dns-floating-ip-updater.py", line 112, in <module>
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: for zone in dns.zones():
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/mwopenstackclients.py", line 246, in zones
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: r = self._get('/v2/zones', params=params)
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/mwopenstackclients.py", line 228, in _get
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: return self._req('GET', *args, **kwargs)
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/mwopenstackclients.py", line 224, in _req
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: r.raise_for_status()
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/requests/models.py", line 893, in raise_for_status
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: raise HTTPError(http_error_msg, response=self)
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: requests.exceptions.HTTPError: 504 Server Error: GATEWAY TIMEOUT for url: http://cloudservices1003.wikimedia.org:9001/v2/zones
May 27 10:11:04 cloudcontrol1003 systemd[1]: designate_floating_ip_ptr_records_updater.service: main process exited, code=exited, status=1/FAILURE
May 27 10:11:04 cloudcontrol1003 systemd[1]: Unit designate_floating_ip_ptr_records_updater.service entered failed state.
May 27 10:20:01 cloudcontrol1003 systemd[1]: Starting Designate Floating IP PTR records updater...
May 27 10:20:01 cloudcontrol1003 systemd[1]: Started Designate Floating IP PTR records updater.

Related Objects

Event Timeline

aborrero triaged this task as High priority.May 27 2019, 12:27 PM
aborrero created this task.
aborrero moved this task from Inbox to Important on the cloud-services-team (Kanban) board.

Searching the logs for the concrete REQ, I see this:

root@cloudservices1003:/var/log/designate# grep a3b3ad94-443d-43d1-8c9e-fd909dc86e8b *.log
designate-api.log:2019-05-27 10:11:03.908 9086 ERROR designate.api.middleware [req-a3b3ad94-443d-43d1-8c9e-fd909dc86e8b novaadmin account-creation-assistance - - -] Timed out waiting for a reply to message ID c90ca1f31ac849e292ec1dc4e9e9d5f1
designate-central.log:2019-05-27 10:11:04.125 9241 WARNING oslo_messaging._drivers.amqpdriver [req-a3b3ad94-443d-43d1-8c9e-fd909dc86e8b novaadmin account-creation-assistance - - -] reply_9624fab4757f4617865097c2c6309b17 doesn't exists, drop reply to c90ca1f31ac849e292ec1dc4e9e9d5f1

It seems this is related to the rabbit queue, either:

  • rabbit is dropping messages
  • something is weird in relation to the latest changes in the HA setup

The logs contains plenty of messages like these:

designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher [req-dbe37e3d-de2a-4699-a72d-63cdd9a30ebd - - - - -] Exception during message handling
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher Traceback (most recent call last):
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 86, in _dispatch_and_handle_error
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self._dispatch(incoming)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 122, in _dispatch
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     ret = self._exec_callback(callback, filtered_messages)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 169, in _exec_callback
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     messages[0]["metadata"])
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 129, in _exec_callback
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     ret = callback(*args)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/designate/sink/service.py", line 116, in info
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     handler.process_notification(context, event_type, payload)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/wmf_sink/wmfsink.py", line 83, in process_notification
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     resource_type='instance')
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/wmf_sink/base.py", line 45, in _delete
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     zone = self.get_zone(cfg.CONF[self.name].domain_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/designate/notification_handler/base.py", line 65, in get_zone
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self.central_api.get_zone(context, zone_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/designate/central/rpcapi.py", line 155, in get_zone
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self.client.call(context, 'get_zone', zone_id=zone_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 413, in call
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self.prepare().call(ctxt, method, **kwargs)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     retry=self.retry)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     timeout=timeout, retry=retry)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     retry=retry)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     result = self._waiter.wait(msg_id, timeout)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     message = self.waiters.get(msg_id, timeout=timeout)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     'to message ID %s' % msg_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher MessagingTimeout: Timed out waiting for a reply to message ID ec31e080ee304c039351de622e241e1e
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher

I think this is a result of cloudservices1004 being most broken right now, due to T224354

aborrero closed this task as Resolved.Jun 4 2019, 11:28 AM
aborrero claimed this task.

I just checked, and the errors are gone. Closing task now.