Page MenuHomePhabricator

cloudservices1003: gateway timeout error
Closed, ResolvedPublic

Description

There is something weird in cloudservices1003. It seems cloudcontrol1003 sees a 504 from time to time.

aborrero@cloudcontrol1003:~$ sudo journalctl -u designate_floating_ip_ptr_records_updater.service
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: 2019-05-27T10:11:03Z root         WARNING : Error response from http://cloudservices1003.wikimedia.org:9001/v2/zones:
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: {"code": 504, "type": "timeout", "request_id": "req-a3b3ad94-443d-43d1-8c9e-fd909dc86e8b"}
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: Traceback (most recent call last):
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/etc/dns-floating-ip-updater.py", line 112, in <module>
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: for zone in dns.zones():
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/mwopenstackclients.py", line 246, in zones
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: r = self._get('/v2/zones', params=params)
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/mwopenstackclients.py", line 228, in _get
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: return self._req('GET', *args, **kwargs)
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/mwopenstackclients.py", line 224, in _req
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: r.raise_for_status()
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: File "/usr/lib/python2.7/dist-packages/requests/models.py", line 893, in raise_for_status
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: raise HTTPError(http_error_msg, response=self)
May 27 10:11:03 cloudcontrol1003 dns-floating-ip-updater.py[24891]: requests.exceptions.HTTPError: 504 Server Error: GATEWAY TIMEOUT for url: http://cloudservices1003.wikimedia.org:9001/v2/zones
May 27 10:11:04 cloudcontrol1003 systemd[1]: designate_floating_ip_ptr_records_updater.service: main process exited, code=exited, status=1/FAILURE
May 27 10:11:04 cloudcontrol1003 systemd[1]: Unit designate_floating_ip_ptr_records_updater.service entered failed state.
May 27 10:20:01 cloudcontrol1003 systemd[1]: Starting Designate Floating IP PTR records updater...
May 27 10:20:01 cloudcontrol1003 systemd[1]: Started Designate Floating IP PTR records updater.

Related Objects

StatusSubtypeAssignedTask
ResolvedAndrew
ResolvedAndrew
OpenNone
OpenNone
ResolvedMoritzMuehlenhoff
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedPapaul
Resolved JHedden
Resolvedaborrero
Resolvedaborrero
ResolvedPapaul
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
ResolvedMarostegui
Resolvedaborrero
ResolvedAndrew
DuplicateNone
ResolvedAndrew
ResolvedAndrew
Invalid JHedden

Event Timeline

aborrero created this task.
aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

Searching the logs for the concrete REQ, I see this:

root@cloudservices1003:/var/log/designate# grep a3b3ad94-443d-43d1-8c9e-fd909dc86e8b *.log
designate-api.log:2019-05-27 10:11:03.908 9086 ERROR designate.api.middleware [req-a3b3ad94-443d-43d1-8c9e-fd909dc86e8b novaadmin account-creation-assistance - - -] Timed out waiting for a reply to message ID c90ca1f31ac849e292ec1dc4e9e9d5f1
designate-central.log:2019-05-27 10:11:04.125 9241 WARNING oslo_messaging._drivers.amqpdriver [req-a3b3ad94-443d-43d1-8c9e-fd909dc86e8b novaadmin account-creation-assistance - - -] reply_9624fab4757f4617865097c2c6309b17 doesn't exists, drop reply to c90ca1f31ac849e292ec1dc4e9e9d5f1

It seems this is related to the rabbit queue, either:

  • rabbit is dropping messages
  • something is weird in relation to the latest changes in the HA setup

The logs contains plenty of messages like these:

designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher [req-dbe37e3d-de2a-4699-a72d-63cdd9a30ebd - - - - -] Exception during message handling
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher Traceback (most recent call last):
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 86, in _dispatch_and_handle_error
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self._dispatch(incoming)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 122, in _dispatch
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     ret = self._exec_callback(callback, filtered_messages)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 169, in _exec_callback
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     messages[0]["metadata"])
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/notify/dispatcher.py", line 129, in _exec_callback
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     ret = callback(*args)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/designate/sink/service.py", line 116, in info
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     handler.process_notification(context, event_type, payload)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/wmf_sink/wmfsink.py", line 83, in process_notification
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     resource_type='instance')
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/wmf_sink/base.py", line 45, in _delete
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     zone = self.get_zone(cfg.CONF[self.name].domain_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/designate/notification_handler/base.py", line 65, in get_zone
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self.central_api.get_zone(context, zone_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/designate/central/rpcapi.py", line 155, in get_zone
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self.client.call(context, 'get_zone', zone_id=zone_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 413, in call
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     return self.prepare().call(ctxt, method, **kwargs)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     retry=self.retry)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     timeout=timeout, retry=retry)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     retry=retry)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     result = self._waiter.wait(msg_id, timeout)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     message = self.waiters.get(msg_id, timeout=timeout)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher   File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher     'to message ID %s' % msg_id)
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher MessagingTimeout: Timed out waiting for a reply to message ID ec31e080ee304c039351de622e241e1e
designate-sink.log:2019-05-27 12:30:59.768 9218 ERROR oslo_messaging.notify.dispatcher

I think this is a result of cloudservices1004 being most broken right now, due to T224354

aborrero claimed this task.

I just checked, and the errors are gone. Closing task now.