Page MenuHomePhabricator

novafullstack errors: Unable to SSH
Closed, ResolvedPublic

Description

This is one of some of the current issues with with novafullsatck (see T309929 also)

ERROR

Instances are unable to get accesses through ssh, timing out.

Workaround

Not known

SOME NOTES

The VMs time out when trying to get the metadata for cloud-init:

2022-06-03 23:46:47,649 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 0.007 seconds
2022-06-03 23:46:47,650 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/openstack' with {'url': 'http://169.254.169.254/openstack', 'allow_redirects': True, 'method': 'GET', 'timeout': 10.0, 'headers': {'User-Agent': 'Cloud-Init/20.4.1'}} configuration
2022-06-03 23:46:57,663 - url_helper.py[DEBUG]: Calling 'http://169.254.169.254/openstack' failed [10/-1s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=10.0)]
2022-06-03 23:46:57,664 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack md from ['http://169.254.169.254/openstack'] after 10 seconds
2022-06-03 23:46:57,664 - util.py[WARNING]: No active metadata service found
2022-06-03 23:46:57,665 - util.py[DEBUG]: No active metadata service found
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOpenStack.py", line 145, in _get_data
    results = self._crawl_metadata()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceOpenStack.py", line 181, in _crawl_metadata
    raise sources.InvalidMetaDataException(
cloudinit.sources.InvalidMetaDataException: No active metadata service found

That prevents the VM from getting a hostname, and in turn prevents puppet from running, failing to setup anything else.

Event Timeline

dcaro added a subscriber: Andrew.
dcaro added a subscriber: rook.

Change 802914 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Openstack nova vendordata: increase metadata timeouts

https://gerrit.wikimedia.org/r/802914

Change 802914 merged by Andrew Bogott:

[operations/puppet@production] Openstack nova vendordata: increase metadata timeouts

https://gerrit.wikimedia.org/r/802914

Change 803269 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Openstack nova vendordata: more fixes to metadata timeouts

https://gerrit.wikimedia.org/r/803269

Change 803269 merged by Andrew Bogott:

[operations/puppet@production] Openstack nova vendordata: more fixes to metadata timeouts

https://gerrit.wikimedia.org/r/803269

When I watch the haproxy logs on e.g. cloudcontrol1003 I see various mysql servers going down and coming back pretty often. That could easily cause pretty much all of the openstack instability we've seen.

At David's suggestion, I've reverted a monitoring patch that might be taxing openstack and the DB:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/802956

Now watching to see if the DB flaps less, or not at all, after that change.

Seems to be flapping less, but still some. Some sample timestamps:

Jun  6 14:16:09
Jun  6 14:21:22
Jun  6 14:24:50
Jun  6 15:15:59
Jun  6 15:25:24
Jun  6 16:24:59
Jun  6 17:24:16
Jun  6 18:26:01
Jun  6 19:25:41
Jun  6 19:57:11
Jun  6 20:09:25
Jun  6 20:16:14
Jun  6 20:20:19
Jun  6 20:24:37

So... hourly, about?

Change 805166 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] openstack,nova-api-metadata: add harakiri timeout

https://gerrit.wikimedia.org/r/805166

Change 805166 merged by David Caro:

[operations/puppet@production] openstack,nova-api-metadata: add harakiri timeout

https://gerrit.wikimedia.org/r/805166

Change 805316 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@wmcs] wmcs: added vm_console runbook

https://gerrit.wikimedia.org/r/805316

dcaro changed the task status from Open to In Progress.Jun 14 2022, 7:53 AM
dcaro claimed this task.
dcaro moved this task from To refine to Doing on the User-dcaro board.

I've been diving in the cloudinit code a bit to see why it can't handle a failed metadata agent. It turns out that the first thing it does is check to see if a given metadata URL is valid (which it does with a short timeout and no retries) -- it uses this to determine if a given metadata service is available.

THEN it tries to actually get the metadata, and this second time it actually uses the retries. But of course in most cases we never get this far because cloud-init has already determined that there is no openstack metadata service available.

So, without rewriting cloud-init there isn't a good way to work around this.

Another angle of pursuit is to improve HAProxy's ability to detect a broken metadata api. I'm toying with the idea of using the /healthcheck endpoint instead of / (https://gerrit.wikimedia.org/r/c/operations/puppet/+/806350) but the /helathcheck endpoint is actually implemented by middleware so it quite possibly knows /less/ about the state of the server than / would know.

We could ask for actual metadata for an actual, specific VM and then check it for validity. To do that we'd need to keep some kind of standard-model VM around at all times so the metadata service knows what we're asking about. I'm not 100% clear on how to tell haproxy to check for specific content in a healthcheck but I expect it's possible somehow.

At this point I think we should wait for one of the api servers to fail and then learn more about how the failed service actually behaves before moving forward on this side.

The neutron metadata agent (which proxies metadata requests from the VMs) has very messy logs. Here's a snippet that corresponds to a failed VM (which was launched at 2:22AM):

2022-06-17 02:23:27.230 2799 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 9 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID cd8c6832eed2486e93d393bc041abae2
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID cd8c6832eed2486e93d393bc041abae2
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID cd8c6832eed2486e93d393bc041abae2
2022-06-17 02:23:36.382 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:23:36.386 2799 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 02:23:36.387 2799 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.42,<local> "GET /latest/meta-data/ HTTP/1.1" status: 500  len: 0 time: 609.1890764
2022-06-17 02:23:43.068 2799 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 15 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d849e04cc60494eb13411baa40f809c
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d849e04cc60494eb13411baa40f809c
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 2d849e04cc60494eb13411baa40f809c
2022-06-17 02:23:58.102 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:23:58.106 2799 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 02:23:58.107 2799 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.42,<local> "GET /latest/meta-data/local-ipv4 HTTP/1.1" status: 500  len: 0 time: 615.0512755
2022-06-17 02:24:44.770 2799 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 12 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0bf89a4037f84810a990eb8cc660adb4
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0bf89a4037f84810a990eb8cc660adb4
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0bf89a4037f84810a990eb8cc660adb4
2022-06-17 02:24:56.942 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 02:24:56.948 2799 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 02:24:56.949 2799 INFO eventlet.wsgi.server [-] ::ffff:172.16.2.18,<local> "PUT /latest/api/token HTTP/1.1" status: 500  len: 0 time: 612.2294033

Here's what it looks like when a VM start up goes better (it's still really bad):

2022-06-17 03:50:07.970 2804 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /openstack HTTP/1.1" status: 200  len: 242 time: 0.5727077
2022-06-17 03:50:08.523 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /openstack HTTP/1.1" status: 200  len: 242 time: 0.5450575
2022-06-17 03:50:09.069 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /openstack/2018-08-27/meta_data.json HTTP/1.1" status: 200  len: 1064 time: 0.5394874
2022-06-17 03:50:09.307 2808 INFO neutron.agent.metadata.agent [-] The instance: 936336a5-3deb-40bb-9b47-4b84b03fdf24 is not present anymore, skipping cache...
2022-06-17 03:50:09.843 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /openstack/2018-08-27/user_data HTTP/1.1" status: 404  len: 297 time: 0.7662125
2022-06-17 03:50:10.425 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /openstack/2018-08-27/vendor_data.json HTTP/1.1" status: 200  len: 8135 time: 0.5751894
2022-06-17 03:50:10.610 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /openstack/2018-08-27/network_data.json HTTP/1.1" status: 200  len: 540 time: 0.1783772
2022-06-17 03:50:10.799 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/ HTTP/1.1" status: 200  len: 347 time: 0.1823065
2022-06-17 03:50:10.996 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/block-device-mapping/ HTTP/1.1" status: 200  len: 143 time: 0.1902785
2022-06-17 03:50:11.195 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/block-device-mapping/ami HTTP/1.1" status: 200  len: 138 time: 0.1927006
2022-06-17 03:50:14.816 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /openstack HTTP/1.1" status: 200  len: 242 time: 0.4926336
2022-06-17 03:50:15.406 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /openstack HTTP/1.1" status: 200  len: 242 time: 0.5828173
2022-06-17 03:50:15.982 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /openstack/2018-08-27/meta_data.json HTTP/1.1" status: 200  len: 1064 time: 0.5691247
2022-06-17 03:50:16.193 2808 INFO neutron.agent.metadata.agent [-] The instance: 4fbfeb6a-80d0-4925-987e-f8a3d6094f9c is not present anymore, skipping cache...
2022-06-17 03:50:16.383 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /openstack/2018-08-27/user_data HTTP/1.1" status: 404  len: 297 time: 0.3944149
2022-06-17 03:50:16.457 2810 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 17 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0a9fa95936ab42f695b8187074be1b04
2022-06-17 03:50:16.842 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /openstack/2018-08-27/vendor_data.json HTTP/1.1" status: 200  len: 8135 time: 0.4518802
2022-06-17 03:50:17.332 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /openstack/2018-08-27/network_data.json HTTP/1.1" status: 200  len: 540 time: 0.4824119
2022-06-17 03:50:17.600 2804 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/ HTTP/1.1" status: 200  len: 347 time: 0.2513909
2022-06-17 03:50:17.773 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/block-device-mapping/ HTTP/1.1" status: 200  len: 143 time: 0.1658044
2022-06-17 03:50:17.979 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/block-device-mapping/ami HTTP/1.1" status: 200  len: 138 time: 0.1996436
2022-06-17 03:50:18.188 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/block-device-mapping/root HTTP/1.1" status: 200  len: 143 time: 0.2022269
2022-06-17 03:50:18.369 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/placement/ HTTP/1.1" status: 200  len: 153 time: 0.1746826
2022-06-17 03:50:18.888 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/placement/availability-zone HTTP/1.1" status: 200  len: 139 time: 0.5119731
2022-06-17 03:50:19.115 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/ami-id HTTP/1.1" status: 200  len: 148 time: 0.2205865
2022-06-17 03:50:19.353 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/ami-launch-index HTTP/1.1" status: 200  len: 136 time: 0.2311926
2022-06-17 03:50:19.832 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/ami-manifest-path HTTP/1.1" status: 200  len: 140 time: 0.4693222
2022-06-17 03:50:20.059 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 200  len: 171 time: 0.2206593
2022-06-17 03:50:20.281 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/instance-action HTTP/1.1" status: 200  len: 139 time: 0.2154434
2022-06-17 03:50:20.476 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/instance-id HTTP/1.1" status: 200  len: 146 time: 0.1878033
2022-06-17 03:50:20.707 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/instance-type HTTP/1.1" status: 200  len: 157 time: 0.2243478
2022-06-17 03:50:20.919 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/local-hostname HTTP/1.1" status: 200  len: 171 time: 0.2051253
2022-06-17 03:50:21.169 2792 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/local-ipv4 HTTP/1.1" status: 200  len: 148 time: 0.2438190
2022-06-17 03:50:21.360 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/public-hostname HTTP/1.1" status: 200  len: 171 time: 0.1844490
2022-06-17 03:50:21.579 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/public-ipv4 HTTP/1.1" status: 200  len: 135 time: 0.2120500
2022-06-17 03:50:21.774 2802 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/reservation-id HTTP/1.1" status: 200  len: 146 time: 0.1892288
2022-06-17 03:50:21.949 2802 INFO eventlet.wsgi.server [-] ::ffff:172.16.4.236,<local> "GET /latest/meta-data/security-groups HTTP/1.1" status: 200  len: 142 time: 0.1688485
2022-06-17 03:50:22.638 2802 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/block-device-mapping/root HTTP/1.1" status: 200  len: 143 time: 0.4169984
2022-06-17 03:50:22.870 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/placement/ HTTP/1.1" status: 200  len: 153 time: 0.2255452
2022-06-17 03:50:23.408 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/placement/availability-zone HTTP/1.1" status: 200  len: 139 time: 0.5255063
2022-06-17 03:50:23.880 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/ami-id HTTP/1.1" status: 200  len: 148 time: 0.4659812
2022-06-17 03:50:24.319 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/ami-launch-index HTTP/1.1" status: 200  len: 136 time: 0.4320285
2022-06-17 03:50:24.514 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/ami-manifest-path HTTP/1.1" status: 200  len: 140 time: 0.1883216
2022-06-17 03:50:24.700 2807 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 200  len: 171 time: 0.1798787
2022-06-17 03:50:24.878 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/instance-action HTTP/1.1" status: 200  len: 139 time: 0.1707218
2022-06-17 03:50:25.068 2808 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/instance-id HTTP/1.1" status: 200  len: 146 time: 0.1835394
2022-06-17 03:50:32.996 2799 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 27 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 92b8dfa4a9644b72b976ab091341d28e
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0a9fa95936ab42f695b8187074be1b04
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 0a9fa95936ab42f695b8187074be1b04
2022-06-17 03:50:33.570 2810 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:33.574 2810 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 03:50:33.575 2810 INFO eventlet.wsgi.server [-] ::ffff:172.16.3.230,<local> "GET /openstack HTTP/1.1" status: 500  len: 0 time: 617.1231313
2022-06-17 03:50:36.329 2812 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/instance-type HTTP/1.1" status: 200  len: 157 time: 0.2086155
2022-06-17 03:50:36.782 2814 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/local-hostname HTTP/1.1" status: 200  len: 171 time: 0.4455528
2022-06-17 03:50:37.324 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/local-ipv4 HTTP/1.1" status: 200  len: 147 time: 0.5344925
2022-06-17 03:50:37.508 2811 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/public-hostname HTTP/1.1" status: 200  len: 171 time: 0.1780195
2022-06-17 03:50:37.933 2812 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/public-ipv4 HTTP/1.1" status: 200  len: 135 time: 0.4171822
2022-06-17 03:50:38.119 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/reservation-id HTTP/1.1" status: 200  len: 146 time: 0.1799839
2022-06-17 03:50:38.600 2801 INFO eventlet.wsgi.server [-] ::ffff:172.16.7.40,<local> "GET /latest/meta-data/security-groups HTTP/1.1" status: 200  len: 142 time: 0.4744921
2022-06-17 03:50:44.018 2799 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 5 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 83038984fe0b409fbb028d2d3b73730d
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 83038984fe0b409fbb028d2d3b73730d
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 83038984fe0b409fbb028d2d3b73730d
2022-06-17 03:50:49.333 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:49.337 2799 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 03:50:49.337 2799 INFO eventlet.wsgi.server [-] ::ffff:172.16.3.230,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 500  len: 0 time: 605.3309252
2022-06-17 03:50:55.033 2799 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 19 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 75cd37b9486c4fc7b4f6e970b403b4df
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 92b8dfa4a9644b72b976ab091341d28e
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 92b8dfa4a9644b72b976ab091341d28e
2022-06-17 03:50:59.716 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:50:59.719 2799 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 03:50:59.720 2799 INFO eventlet.wsgi.server [-] ::ffff:172.16.3.230,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 500  len: 0 time: 626.7302821
2022-06-17 03:51:08.921 2810 ERROR neutron_lib.rpc [-] Timeout in RPC method get_ports. Waiting for 15 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 52bcb87c1dc34c5b91940145dbcc32ec
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 75cd37b9486c4fc7b4f6e970b403b4df
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 75cd37b9486c4fc7b4f6e970b403b4df
2022-06-17 03:51:13.641 2799 ERROR neutron.agent.metadata.agent 
2022-06-17 03:51:13.645 2799 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 03:51:13.646 2799 INFO eventlet.wsgi.server [-] ::ffff:172.16.3.230,<local> "GET /latest/meta-data/hostname HTTP/1.1" status: 500  len: 0 time: 618.6233320
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent [-] Unexpected error.: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 52bcb87c1dc34c5b91940145dbcc32ec
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return waiter.wait()
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return get_hub().switch()
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 313, in switch
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self.greenlet.switch()
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent _queue.Empty
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent 
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent During handling of the above exception, another exception occurred:
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent 
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent Traceback (most recent call last):
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 91, in __call__
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     instance_id, tenant_id = self._get_instance_and_tenant_id(req)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 211, in _get_instance_and_tenant_id
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     ports = self._get_ports(
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 175, in _get_ports
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     networks = self._get_router_networks(router_id,
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/common/cache_utils.py", line 121, in __call__
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self.func(target_self, *args, **kwargs)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 142, in _get_router_networks
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     internal_ports = self._get_ports_from_server(router_id=router_id)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 117, in _get_ports_from_server
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self.plugin_rpc.get_ports(self.context, filters)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron/agent/metadata/agent.py", line 74, in get_ports
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return cctxt.call(context, 'get_ports', filters=filters)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 180, in call
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     time.sleep(wait)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     self.force_reraise()
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     raise self.value
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/neutron_lib/rpc.py", line 157, in call
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self._original_context.call(ctxt, method, **kwargs)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 175, in call
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     self.transport._send(self.target, msg_ctxt, msg,
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 123, in _send
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self._driver.send(target, ctxt, message,
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 680, in send
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     return self._send(target, ctxt, message, wait_for_reply, timeout,
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 669, in _send
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     result = self._waiter.wait(msg_id, timeout,
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     message = self.waiters.get(msg_id, timeout=timeout)
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent     raise oslo_messaging.MessagingTimeout(
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 52bcb87c1dc34c5b91940145dbcc32ec
2022-06-17 03:51:24.080 2810 ERROR neutron.agent.metadata.agent 
2022-06-17 03:51:24.084 2810 INFO eventlet.wsgi.server [-] Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 604, in handle_one_response
    write(b''.join(towrite))
  File "/usr/lib/python3/dist-packages/eventlet/wsgi.py", line 538, in write
    wfile.flush()
  File "/usr/lib/python3.9/socket.py", line 722, in write
    return self._sock.send(b)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 399, in send
    return self._send_loop(self.fd.send, data, flags)
  File "/usr/lib/python3/dist-packages/eventlet/greenio/base.py", line 386, in _send_loop
    return send_method(data, *args)
BrokenPipeError: [Errno 32] Broken pipe

2022-06-17 03:51:24.085 2810 INFO eventlet.wsgi.server [-] ::ffff:172.16.3.230,<local> "GET /latest/meta-data/public-ipv4 HTTP/1.1" status: 500  len: 0 time: 615.1995487

all of the above was cleared up by restarting the neutron-rpc-server agents. I don't think I've learned anything here beyond 'different services keep getting messed up'

I created a pull request upstream cloud-init to do retries on the metadata service discovery too:
https://github.com/canonical/cloud-init/pull/1529

Though that does not fix the underlying issue, might minimize the amount of errored VMs.

Change 806466 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] neutron: increase rpc_response_timeout

https://gerrit.wikimedia.org/r/806466

Change 806466 merged by Andrew Bogott:

[operations/puppet@production] neutron: increase rpc_response_timeout

https://gerrit.wikimedia.org/r/806466

Change 807174 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] openstack.vendordata: reduce timeout so it retries

https://gerrit.wikimedia.org/r/807174

Change 807221 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] openstack.vendordata: Allow downgrading packages too

https://gerrit.wikimedia.org/r/807221

Change 807174 merged by David Caro:

[operations/puppet@production] openstack.vendordata: reduce timeout so it retries

https://gerrit.wikimedia.org/r/807174

Change 807221 merged by David Caro:

[operations/puppet@production] openstack.vendordata: Allow downgrading packages too

https://gerrit.wikimedia.org/r/807221

This seems to be sorted out, by stabilizing the haproxy flapping to the galera service, and tweaking the timeouts and,
retries, will reopen if it happens again (currently we have other new issues xd).

Change 805316 merged by jenkins-bot:

[operations/cookbooks@wmcs] wmcs: added vm_console runbook

https://gerrit.wikimedia.org/r/805316