Page MenuHomePhabricator

cloudcontrol1003 - Check for VMs leaked by the nova-fullstack test
Closed, ResolvedPublic

Description

From a page and https://alerts.wikimedia.org/?q=team%3Dwmcs:

Check for VMs leaked by the nova-fullstack test
summary: 10 instances in the admin-monitoring project
11 minutes ago
instance: cloudcontrol1003
source: icinga
team: wmcs

Event Timeline

dcaro changed the task status from Open to In Progress.Feb 23 2022, 8:20 AM
dcaro triaged this task as High priority.
dcaro created this task.
dcaro moved this task from To refine to Doing on the User-dcaro board.

From the logs, there seems to be a timeout (the last error):

dcaro@cloudcontrol1003:~$ echo "$(sudo journalctl -n 5000 -u  nova-fullstack.service | grep '"log.level": "ERROR"' | tail -n 1 | sed -e 's/^.*@cee: //')" | jq '.'
{
  "ecs.version": "1.7.0",
  "log.level": "ERROR",
  "log.origin.file.line": 758,
  "log.origin.file.name": "nova-fullstack",
  "log.origin.file.path": "/usr/local/sbin/nova-fullstack",
  "log.origin.function": "main",
  "message": "fullstackd-20220223074731 failed, leaking",
  "process.name": "MainProcess",
  "process.thread.id": 3747163,
  "process.thread.name": "MainThread",
  "timestamp": "2022-02-23T08:02:36.202337",
  "error.stack": "Traceback (most recent call last):\n  File \"/usr/local/sbin/nova-fullstack\", line 672, in main\n    vc, server = verify_create(\n  File \"/usr/local/sbin/nova-fullstack\", line 403, in verify_create\n    raise Exception(\"creation of {} timed out\".format(cserver.id))\nException: creation of a34df078-d521-4dd9-91db-df162e631306 timed out"
}

It seems it has been leaving VMs in error for the last few hours (according to the VM name):

dcaro@cloudcontrol1003:~$ sudo wmcs-openstack --os-project-id admin-monitoring server list
+--------------------------------------+---------------------------+--------+----------+--------------------+-----------------------+
| ID                                   | Name                      | Status | Networks | Image              | Flavor                |
+--------------------------------------+---------------------------+--------+----------+--------------------+-----------------------+
| 8371c657-50ac-491d-a2d8-b61fd0fe7cf2 | fullstackd-20220223080736 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| a34df078-d521-4dd9-91db-df162e631306 | fullstackd-20220223074731 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| df6f327c-c297-46e1-b068-2c834678dd58 | fullstackd-20220223072724 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| cb67f212-6495-4b18-b63c-cfddd407e2da | fullstackd-20220223070718 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| 0704e8f0-1cca-45e0-a542-43ca344f4a1c | fullstackd-20220223064713 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| e6039018-b926-4f2b-bf54-709c24459d8c | fullstackd-20220223062709 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| e54fbaa7-4426-49f9-82bf-ffe3381bfd5d | fullstackd-20220223060703 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| b4a50f10-80bc-49a3-b05d-058c466f6dd3 | fullstackd-20220223023524 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| 16509188-b334-44d8-8ae3-1cf0ec2d99d9 | fullstackd-20220223021517 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
| bb6ab4d0-c9fe-403a-ae90-d7984f3d2c81 | fullstackd-20220223015512 | ERROR  |          | debian-10.0-buster | g3.cores1.ram2.disk20 |
+--------------------------------------+---------------------------+--------+----------+--------------------+-----------------------+

From the logstash logs for neutron (https://logstash.wikimedia.org/goto/505e9f6359c9b006cd0e9e1c100b8d0a), it refuses
to create the port:

Refusing to bind port 1bd4dd1b-a538-41ac-aa9b-bbabb1d0d72b to dead agent:
{'admin_state_up': True,
 'agent_type': 'Linux bridge agent',
 'alive': False,
 'availability_zone': None,
 'binary': 'neutron-linuxbridge-agent',
 'configurations': {'bridge_mappings': {},
                    'devices': 29,
                    'extensions': [],
                    'interface_mappings': {'cloudinstances2b': 'eno2np1.1105'},
                    'l2_population': False,
                    'tunnel_types': ['vxlan'],
                    'tunneling_ip': '10.64.20.51'},
 'created_at': datetime.datetime(2019, 1, 3, 20, 14, 6),
 'description': None,
 'heartbeat_timestamp': datetime.datetime(2022, 2, 23, 0, 36, 18),
 'host': 'cloudvirt1027',
 'id': '3388792d-560d-4bfe-9054-addf1c239f4a',
 'resource_versions': {'ConntrackHelper': '1.0',
                       'Log': '1.0',
                       'Network': '1.1',
                       'Port': '1.5',
                       'PortForwarding': '1.2',
                       'QosPolicy': '1.8',
                       'SecurityGroup': '1.2',
                       'SecurityGroupRule': '1.0',
                       'SubPort': '1.0',
                       'Subnet': '1.1',
                       'Trunk': '1.1'},
 'resources_synced': None,
 'started_at': datetime.datetime(2021, 10, 24, 0, 57, 3),
 'topic': 'N/A'}

From the journal logs on cloudcontrol1003 (nova-api), there's this interesting log when grepping for the request-id:

root@cloudcontrol1003:~# journalctl -n 500000 | grep req-d2aa1f0b-ecf9-49f2-84d9-99f6c3f14dca
Feb 23 08:07:42 cloudcontrol1003 neutron-api[2485019]: 2022-02-23 08:07:42.417 2485019 WARNING neutron.api.rpc.agentnotifiers.dhcp_rpc_agent_api [req-d2aa1f0b-ecf9-49f2-84d9-99f6c3f14dca novaadmin admin - default default] Only 0 of 2 DHCP agents associated with network '7425e328-560c-4f00-8e99-706f3fb90bb4' are marked as active, so notifications may be sent to inactive agents.

Found T205524 that seems related, and for the likes of it (xxx seems it should be :-)) all agents are down:

1root@cloudcontrol1003:~# neutron agent-list
2neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
3+--------------------------------------+--------------------+--------------------+-------------------+-------+----------------+---------------------------+
4| id | agent_type | host | availability_zone | alive | admin_state_up | binary |
5+--------------------------------------+--------------------+--------------------+-------------------+-------+----------------+---------------------------+
6| 0b2f519f-a5ab-4188-82bf-01431810d55a | DHCP agent | cloudnet1003 | nova | xxx | True | neutron-dhcp-agent |
7| 1071c198-ed57-4b5a-9439-30e66a31aa69 | Linux bridge agent | cloudvirtan1005 | | xxx | True | neutron-linuxbridge-agent |
8| 1c394148-02d0-4a06-892c-f5cec29ef0b0 | Linux bridge agent | cloudvirt1037 | | xxx | True | neutron-linuxbridge-agent |
9| 20854226-8d18-4d1f-8da3-a68a8fd4dc9f | Linux bridge agent | cloudvirt1033 | | xxx | True | neutron-linuxbridge-agent |
10| 28ac0947-f263-4655-98fe-f868325678ae | Linux bridge agent | cloudvirt1015 | | xxx | True | neutron-linuxbridge-agent |
11| 2eeef198-8af7-4e5d-bd73-e14a2a8d2404 | Linux bridge agent | cloudvirtan1004 | | xxx | True | neutron-linuxbridge-agent |
12| 3388792d-560d-4bfe-9054-addf1c239f4a | Linux bridge agent | cloudvirt1027 | | xxx | True | neutron-linuxbridge-agent |
13| 345b47ea-8d4a-4cb0-9619-8a024440c4cc | Linux bridge agent | cloudvirt1046 | | xxx | True | neutron-linuxbridge-agent |
14| 350cff0f-37db-442d-83fd-8e2ae95c22be | Linux bridge agent | cloudvirt1031 | | xxx | True | neutron-linuxbridge-agent |
15| 3c4eb1fa-d039-46dc-8d72-0540e3043d47 | Linux bridge agent | cloudvirt1041 | | xxx | True | neutron-linuxbridge-agent |
16| 3dd3644d-9a47-4ecd-b4c7-9364c75ac105 | Linux bridge agent | cloudvirt1004 | | xxx | True | neutron-linuxbridge-agent |
17| 3fbff0a4-bc7b-450c-9dae-fda214dee816 | Linux bridge agent | cloudvirt1045 | | xxx | True | neutron-linuxbridge-agent |
18| 3fec4d0a-b65f-4930-9350-e21f473a02bf | Linux bridge agent | cloudvirt1006 | | xxx | True | neutron-linuxbridge-agent |
19| 408f25f1-e09a-4e61-b8fb-8b0b8bdf16e2 | Linux bridge agent | cloudvirt1034 | | xxx | True | neutron-linuxbridge-agent |
20| 468aef2a-8eb6-4382-abba-bc284efd9fa5 | DHCP agent | cloudnet1004 | nova | xxx | True | neutron-dhcp-agent |
21| 49a53961-ee67-458c-b7fd-c1e9d37e23c1 | Linux bridge agent | cloudvirt1044 | | xxx | True | neutron-linuxbridge-agent |
22| 49b85656-d67b-44c3-ac71-e8c75b849783 | Linux bridge agent | cloudvirt1029 | | xxx | True | neutron-linuxbridge-agent |
23| 4be214c8-76ef-40f8-9d5d-4c344d213311 | L3 agent | cloudnet1003 | nova | xxx | True | neutron-l3-agent |
24| 4ec0d5cb-e419-427b-9232-45e49ad3f416 | Linux bridge agent | cloudvirt1005 | | xxx | True | neutron-linuxbridge-agent |
25| 5b2a8c8b-3b13-4607-b0bd-460d507f5de1 | Linux bridge agent | cloudvirt1024 | | xxx | True | neutron-linuxbridge-agent |
26| 5bf8b31c-d752-4986-820e-c161d5fa70a9 | Linux bridge agent | cloudvirt1032 | | xxx | True | neutron-linuxbridge-agent |
27| 65f9d324-5126-4336-8f52-001cd0c9fdd1 | Linux bridge agent | cloudvirt1016 | | xxx | True | neutron-linuxbridge-agent |
28| 682e90e1-6457-4c24-b6f0-4f573d58710b | Linux bridge agent | cloudvirt1042 | | xxx | True | neutron-linuxbridge-agent |
29| 6d228267-bfe3-448b-9dc9-9705e5ccde56 | Linux bridge agent | cloudvirt1002 | | xxx | True | neutron-linuxbridge-agent |
30| 6dafa3f3-9aeb-47b6-9535-e0932abe4435 | Linux bridge agent | cloudvirt1014 | | xxx | True | neutron-linuxbridge-agent |
31| 70a51cd5-7d76-40d9-b4f1-6dd556122391 | Linux bridge agent | cloudvirt1013 | | xxx | True | neutron-linuxbridge-agent |
32| 70c898a1-f881-4474-bbbb-10fb8d060d1e | Linux bridge agent | cloudvirt-wdqs1002 | | xxx | True | neutron-linuxbridge-agent |
33| 778b2757-088c-4742-8d76-22c3e6d9a306 | Linux bridge agent | cloudvirt1017 | | xxx | True | neutron-linuxbridge-agent |
34| 7bbc9374-e1b0-4c80-ae69-62e6e90af281 | Linux bridge agent | cloudvirt1035 | | xxx | True | neutron-linuxbridge-agent |
35| 813c9efb-b2af-4063-a8b3-8e8f1976977c | Linux bridge agent | cloudvirt1025 | | xxx | True | neutron-linuxbridge-agent |
36| 818331b3-930b-4e89-a149-e5b91e145121 | Linux bridge agent | cloudvirt1008 | | xxx | True | neutron-linuxbridge-agent |
37| 88573317-545f-43af-9b5a-f731d43846fd | Linux bridge agent | cloudvirt1009 | | xxx | True | neutron-linuxbridge-agent |
38| 8d0e50cb-ab8d-4923-87f8-b3c93f347be6 | Linux bridge agent | cloudvirt1012 | | xxx | True | neutron-linuxbridge-agent |
39| 8e8cb1fd-62b6-475f-ae2f-2b85355fd3e3 | Linux bridge agent | cloudvirt1030 | | xxx | True | neutron-linuxbridge-agent |
40| 8fb2e553-03d7-4d3a-a01d-d690a61255ce | Linux bridge agent | cloudvirt-wdqs1001 | | xxx | True | neutron-linuxbridge-agent |
41| 9238d8cd-02cb-4d1f-a629-3b3d8cc9c1bf | Linux bridge agent | cloudnet1003 | | xxx | True | neutron-linuxbridge-agent |
42| 94feb5fd-38c5-4283-9a02-b3ef48f104be | Linux bridge agent | cloudvirt1001 | | xxx | True | neutron-linuxbridge-agent |
43| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | L3 agent | cloudnet1004 | nova | xxx | True | neutron-l3-agent |
44| a3d2685e-f826-4666-993c-0ad304718d41 | Linux bridge agent | cloudvirt1026 | | xxx | True | neutron-linuxbridge-agent |
45| aa714a6a-b44f-49b0-8a93-c16dc4626252 | Linux bridge agent | cloudvirt1043 | | xxx | True | neutron-linuxbridge-agent |
46| ad3461d7-b79e-4279-921d-5a476e296767 | Linux bridge agent | cloudnet1004 | | xxx | True | neutron-linuxbridge-agent |
47| af08efd1-9936-41dd-b04b-3b6d6039f9e5 | Linux bridge agent | cloudvirt1040 | | xxx | True | neutron-linuxbridge-agent |
48| afcb9b7f-c1a6-4ff4-9b10-92bfbe8d1a56 | Linux bridge agent | cloudvirtan1002 | | xxx | True | neutron-linuxbridge-agent |
49| afe173eb-35ba-444a-9960-899629786d2f | Linux bridge agent | cloudvirtan1003 | | xxx | True | neutron-linuxbridge-agent |
50| b0f1cdf2-8d03-4f7b-978c-201ecea69b84 | Linux bridge agent | cloudvirt1020 | | xxx | True | neutron-linuxbridge-agent |
51| b296d716-d72f-479e-9977-5e06637ca89d | Linux bridge agent | cloudvirt1028 | | xxx | True | neutron-linuxbridge-agent |
52| b2f9da63-2f16-4aa5-9400-ae708a733f91 | Linux bridge agent | cloudvirt1021 | | xxx | True | neutron-linuxbridge-agent |
53| b77956b4-c836-4644-a490-1a16b6856323 | Linux bridge agent | cloudvirt-wdqs1003 | | xxx | True | neutron-linuxbridge-agent |
54| bad663b3-fd25-4393-a546-4b1b4bdec4db | Linux bridge agent | cloudvirtan1001 | | xxx | True | neutron-linuxbridge-agent |
55| c2403ab2-512a-4663-8f4c-3851940d3847 | Linux bridge agent | cloudvirt1018 | | xxx | True | neutron-linuxbridge-agent |
56| cb07848f-d4a0-4d64-81c9-03359cd46ee7 | Linux bridge agent | cloudvirt1039 | | xxx | True | neutron-linuxbridge-agent |
57| ce4f0afc-d0c0-411b-9faa-9e4f83c746b0 | Linux bridge agent | cloudvirt1036 | | xxx | True | neutron-linuxbridge-agent |
58| d475e07d-52b3-476e-9a4f-e63b21e1075e | Metadata agent | cloudnet1004 | | xxx | True | neutron-metadata-agent |
59| dfe704de-9bd4-4e8c-aa2f-ce483c001be3 | Linux bridge agent | cloudvirt1003 | | xxx | True | neutron-linuxbridge-agent |
60| e349b624-d593-43cf-ac9a-59f473cfa5c6 | Metadata agent | cloudnet1003 | | xxx | True | neutron-metadata-agent |
61| e382a233-e6a0-422e-9d2e-5651082783fc | Linux bridge agent | cloudvirt1022 | | xxx | True | neutron-linuxbridge-agent |
62| e8e60646-0d34-4d3b-8d91-8881dd967382 | Linux bridge agent | cloudvirt1038 | | xxx | True | neutron-linuxbridge-agent |
63| e9f219db-efdd-4157-9045-48316c61de5e | Linux bridge agent | cloudvirt1023 | | xxx | True | neutron-linuxbridge-agent |
64| edc264e5-6e97-49aa-981a-95f81d81a3ab | Linux bridge agent | cloudvirt1007 | | xxx | True | neutron-linuxbridge-agent |
65| fc45a34d-d8a4-45fe-982d-5b4b7a8fcde1 | Linux bridge agent | cloudvirt1019 | | xxx | True | neutron-linuxbridge-agent |
66+--------------------------------------+--------------------+--------------------+-------------------+-------+----------------+---------------------------+

On cloudnet1003 logs for neutron-dhcp-agent:

Feb 23 09:02:05 cloudnet1003 neutron-dhcp-agent[1520]: 2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent [req-da72f8fa-cb7c-4949-af59-ce93867b0a8a - - - - -] Failed reporting state!: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 78766a058bf149
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     return self._queues[msg_id].get(block=True, timeout=timeout)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 322, in get
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     return waiter.wait()
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     return get_hub().switch()
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 298, in switch
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     return self.greenlet.switch()
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent _queue.Empty
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent During handling of the above exception, another exception occurred:
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/neutron/agent/dhcp/agent.py", line 1090, in _report_state
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     ctx, self.agent_state, True)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/neutron/agent/rpc.py", line 103, in report_state
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     return method(context, 'report_state', **kwargs)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 179, in call
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     transport_options=self.transport_options)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 128, in _send
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     transport_options=transport_options)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 682, in send
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     transport_options=transport_options)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     call_monitor_timeout)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     message = self.waiters.get(msg_id, timeout=timeout)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 437, in get
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent     'to message ID %s' % msg_id)
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 78766a058bf1496aa752b81eeb6329d9
                                                       2022-02-23 09:02:05.297 1520 ERROR neutron.agent.dhcp.agent
Feb 23 09:02:05 cloudnet1003 neutron-dhcp-agent[1520]: 2022-02-23 09:02:05.299 1520 WARNING oslo.service.loopingcall [req-da72f8fa-cb7c-4949-af59-ce93867b0a8a - - - - -] Function 'neutron.agent.dhcp.agent.DhcpAgentWithStateReport._report_state' run outlasted interval by 25.01 sec

It seems that rabbit is having issues too.

It seems healthy though:

root@cloudcontrol1003:~# rabbitmqctl cluster_status
Cluster status of node rabbit@cloudcontrol1003 ...
Basics

Cluster name: rabbit@cloudcontrol1005.wikimedia.org

Disk Nodes

rabbit@cloudcontrol1003
rabbit@cloudcontrol1004
rabbit@cloudcontrol1005

Running Nodes

rabbit@cloudcontrol1003
rabbit@cloudcontrol1004
rabbit@cloudcontrol1005

Versions

rabbit@cloudcontrol1003: RabbitMQ 3.8.9 on Erlang 23.2.6
rabbit@cloudcontrol1004: RabbitMQ 3.8.9 on Erlang 23.2.6
rabbit@cloudcontrol1005: RabbitMQ 3.8.9 on Erlang 23.2.6

Maintenance status

Node: rabbit@cloudcontrol1003, status: not under maintenance
Node: rabbit@cloudcontrol1004, status: not under maintenance
Node: rabbit@cloudcontrol1005, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@cloudcontrol1003, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@cloudcontrol1003, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@cloudcontrol1003, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@cloudcontrol1003, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@cloudcontrol1004, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@cloudcontrol1004, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@cloudcontrol1004, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@cloudcontrol1004, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@cloudcontrol1005, interface: [::], port: 15672, protocol: http, purpose: HTTP API
Node: rabbit@cloudcontrol1005, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@cloudcontrol1005, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@cloudcontrol1005, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Feature flags

Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: quorum_queue, state: enabled
Flag: virtual_host_metadata, state: enabled

Mentioned in SAL (#wikimedia-cloud) [2022-02-23T09:38:00Z] <dcaro> restarting neutron-dhcp-agent on cloudnet1003 (T302369)

Mentioned in SAL (#wikimedia-cloud) [2022-02-23T09:39:19Z] <dcaro> restarting neutron-api cloudcontrol1003 to see if the agent status update starts working (T302369)

I found this in the neutron-api logs:

Feb 23 06:12:38 cloudcontrol1003 neutron-api[2485020]: 2022-02-23 06:12:38.479 2485020 WARNING oslo_db.sqlalchemy.exc_filters [req-da9368f0-3050-4a1e-9500-3cbd7d23f2b7 novaadmin admin - default default] DBAPIError exception wrapped.: pymysql.err.InternalError: (1047, 'WSREP has not yet prepared node for application use')
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters Traceback (most recent call last):
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     self.dialect.do_execute(
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     cursor.execute(statement, parameters)
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 170, in execute
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     result = self._query(query)
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 328, in _query
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     conn.query(q)
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 517, in query
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 732, in _read_query_result
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     result.read()
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1075, in read
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     first_packet = self.connection._read_packet()
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 684, in _read_packet
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     packet.check_error()
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 220, in check_error
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     err.raise_mysql_exception(self._data)
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters   File "/usr/lib/python3/dist-packages/pymysql/err.py", line 109, in raise_mysql_exception
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters     raise errorclass(errno, errval)
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters pymysql.err.InternalError: (1047, 'WSREP has not yet prepared node for application use')
                                                       2022-02-23 06:12:38.479 2485020 ERROR oslo_db.sqlalchemy.exc_filters
Feb 23 06:12:38 cloudcontrol1003 neutron-api[2485020]: 2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation [req-da9368f0-3050-4a1e-9500-3cbd7d23f2b7 novaadmin admin - default default] GET failed.: oslo_db.exception.DBError: (pymysql.err.InternalError) (1047, 'WSREP has not yet prepared node for application use')
                                                       [SQL: SELECT networksegments.id AS networksegments_id, networksegments.network_id AS networksegments_network_id, networksegments.network_type AS networksegments_network_type, networksegments.physical_network AS networksegments_physical_network, networksegments.segmentation_id AS networksegments_segmentation_id, networksegments.is_dynamic AS networksegments_is_dynamic, networksegments.segment_index AS networksegments_segment_index, networksegments.name AS networksegments_name, networksegments.standard_attr_id AS networksegments_standard_attr_id, standardattributes_1.id AS standardattributes_1_id, standardattributes_1.resource_type AS standardattributes_1_resource_type, standardattributes_1.description AS standardattributes_1_description, standardattributes_1.revision_number AS standardattributes_1_revision_number, standardattributes_1.created_at AS standardattributes_1_created_at, sta>
                                                       FROM networksegments LEFT OUTER JOIN standardattributes AS standardattributes_1 ON standardattributes_1.id = networksegments.standard_attr_id 
                                                       WHERE networksegments.network_id IN (%(network_id_1)s) AND networksegments.is_dynamic IN (%(is_dynamic_1)s) ORDER BY networksegments.network_id ASC, networksegments.segment_index ASC, networksegments.standard_attr_id ASC]
                                                       [parameters: {'network_id_1': '7425e328-560c-4f00-8e99-706f3fb90bb4', 'is_dynamic_1': 0}]
                                                       (Background on this error at: http://sqlalche.me/e/13/2j85)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation Traceback (most recent call last):
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.dialect.do_execute(
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     cursor.execute(statement, parameters)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 170, in execute
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     result = self._query(query)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 328, in _query
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     conn.query(q)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 517, in query
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 732, in _read_query_result
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     result.read()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1075, in read
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     first_packet = self.connection._read_packet()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 684, in _read_packet
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     packet.check_error()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 220, in check_error
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     err.raise_mysql_exception(self._data)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/err.py", line 109, in raise_mysql_exception
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise errorclass(errno, errval)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation pymysql.err.InternalError: (1047, 'WSREP has not yet prepared node for application use')
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation 
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation The above exception was the direct cause of the following exception:
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation 
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation Traceback (most recent call last):
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pecan/core.py", line 683, in __call__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.invoke_controller(controller, args, kwargs, state)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pecan/core.py", line 574, in invoke_controller
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     result = controller(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     setattr(e, '_RETRY_EXCEEDED', True)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise value
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     ectxt.value = e.inner_exc
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise value
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     LOG.debug("Retry wrapper got retriable exception: %s", e)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise value
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*dup_args, **dup_kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/pecan_wsgi/controllers/utils.py", line 76, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/pecan_wsgi/controllers/resource.py", line 40, in index
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return self.get(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/pecan_wsgi/controllers/resource.py", line 51, in get
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return {self.resource: self.plugin_shower(*getter_args, fields=fields)}
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 233, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return method(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     setattr(e, '_RETRY_EXCEEDED', True)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise value
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     ectxt.value = e.inner_exc
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise value
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*args, **kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     LOG.debug("Retry wrapper got retriable exception: %s", e)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.force_reraise()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     six.reraise(self.type_, self.value, self.tb)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise value
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return f(*dup_args, **dup_kwargs)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/plugin.py", line 1155, in get_network
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.type_manager.extend_network_dict_provider(context, net_data)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 158, in extend_network_dict_provider
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return self.extend_networks_dict_provider(context, [network])
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 162, in extend_networks_dict_provider
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     net_segments = segments_db.get_networks_segments(context, ids)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/db/segments_db.py", line 94, in get_networks_segments
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     objs = network_obj.NetworkSegment.get_objects(context, **filters)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/objects/network.py", line 168, in get_objects
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return super(NetworkSegment, cls).get_objects(context, _pager,
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/objects/base.py", line 637, in get_objects
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     db_objs = obj_db_api.get_objects(
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron/objects/db/api.py", line 48, in get_objects
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return model_query.get_collection(
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/neutron_lib/db/model_query.py", line 314, in get_collection
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     items = [
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return self._execute_and_instances(context)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     result = conn.execute(querycontext.statement, self._params)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return meth(self, multiparams, params)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     return connection._execute_clauseelement(self, multiparams, params)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     ret = self._execute_context(
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self._handle_dbapi_exception(
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1508, in _handle_dbapi_exception
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     util.raise_(newraise, with_traceback=exc_info[2], from_=e)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise exception
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self.dialect.do_execute(
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     cursor.execute(statement, parameters)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 170, in execute
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     result = self._query(query)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 328, in _query
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     conn.query(q)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 517, in query
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 732, in _read_query_result
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     result.read()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1075, in read
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     first_packet = self.connection._read_packet()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 684, in _read_packet
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     packet.check_error()
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/protocol.py", line 220, in check_error
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     err.raise_mysql_exception(self._data)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation   File "/usr/lib/python3/dist-packages/pymysql/err.py", line 109, in raise_mysql_exception
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation     raise errorclass(errno, errval)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation oslo_db.exception.DBError: (pymysql.err.InternalError) (1047, 'WSREP has not yet prepared node for application use')
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation [SQL: SELECT networksegments.id AS networksegments_id, networksegments.network_id AS networksegments_network_id, networksegments.network_type AS networksegments_network_type, networksegments.physical_network AS networksegments_physical_network, networksegments.segmentation_id AS networksegments_segmentation_id, networksegments.is_dynamic AS networksegments_is_dynamic, networksegments.segment_index AS networksegments_segment_index, networksegments.name AS networksegments_name, networksegments.standard_attr_id AS networksegments_standard_attr_id, standardattributes_1.id AS standardattributes_1_id, standardattributes_1.resource_type AS standardattributes_1_resource_type, standardattributes_1.description AS standardattributes_1_description, standardattributes_1.revision_number AS standardattributes_1_revision_numb>
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation FROM networksegments LEFT OUTER JOIN standardattributes AS standardattributes_1 ON standardattributes_1.id = networksegments.standard_attr_id 
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation WHERE networksegments.network_id IN (%(network_id_1)s) AND networksegments.is_dynamic IN (%(is_dynamic_1)s) ORDER BY networksegments.network_id ASC, networksegments.segment_index ASC, networksegments.standard_attr_id ASC]
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation [parameters: {'network_id_1': '7425e328-560c-4f00-8e99-706f3fb90bb4', 'is_dynamic_1': 0}]
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation (Background on this error at: http://sqlalche.me/e/13/2j85)
                                                       2022-02-23 06:12:38.486 2485020 ERROR neutron.pecan_wsgi.hooks.translation

My theory is that a DB error made the neutron-api crash, making neutron agents in turn fail.

We just restarted the neutron-api service, which should be the fix here.

The DB error is probably T302146: Galera on cloudcontrol1004 going out of sync

The neutron-rpc-server service was stopped in cloudvirt1003, starting it up made all the agents to start reporting their statuses correctly.
It seems that this component is the one receiving the agent state reports and marking them up or down.

This will need some followup tasks:

  • Add alerting for this service not being up
  • Add alerting for the agents being down (if possible, currently from neutron agent-list, but probably there's an api).
  • Add documentation on what to do when neutron says all agents are down/refused to bind a port.
  • Link that from the fullstack alert documentation and the new alert runbook documentation.

Will create the tasks in a bit (got a meeting).

Mentioned in SAL (#wikimedia-cloud) [2022-02-23T10:05:22Z] <dcaro> Deleting stuck novafullstack servers, to let the service create new ones (T302369)

aborrero subscribed.

for the record, after all the operations this is state of the neutron agents:

root@cloudcontrol1003:~# neutron agent-list
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+--------------------+--------------------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type         | host               | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+--------------------+-------------------+-------+----------------+---------------------------+
| 0b2f519f-a5ab-4188-82bf-01431810d55a | DHCP agent         | cloudnet1003       | nova              | :-)   | True           | neutron-dhcp-agent        |
| 1c394148-02d0-4a06-892c-f5cec29ef0b0 | Linux bridge agent | cloudvirt1037      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 20854226-8d18-4d1f-8da3-a68a8fd4dc9f | Linux bridge agent | cloudvirt1033      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 3388792d-560d-4bfe-9054-addf1c239f4a | Linux bridge agent | cloudvirt1027      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 345b47ea-8d4a-4cb0-9619-8a024440c4cc | Linux bridge agent | cloudvirt1046      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 350cff0f-37db-442d-83fd-8e2ae95c22be | Linux bridge agent | cloudvirt1031      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 3c4eb1fa-d039-46dc-8d72-0540e3043d47 | Linux bridge agent | cloudvirt1041      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 3fbff0a4-bc7b-450c-9dae-fda214dee816 | Linux bridge agent | cloudvirt1045      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 408f25f1-e09a-4e61-b8fb-8b0b8bdf16e2 | Linux bridge agent | cloudvirt1034      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 468aef2a-8eb6-4382-abba-bc284efd9fa5 | DHCP agent         | cloudnet1004       | nova              | :-)   | True           | neutron-dhcp-agent        |
| 49a53961-ee67-458c-b7fd-c1e9d37e23c1 | Linux bridge agent | cloudvirt1044      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 49b85656-d67b-44c3-ac71-e8c75b849783 | Linux bridge agent | cloudvirt1029      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 4be214c8-76ef-40f8-9d5d-4c344d213311 | L3 agent           | cloudnet1003       | nova              | :-)   | True           | neutron-l3-agent          |
| 5b2a8c8b-3b13-4607-b0bd-460d507f5de1 | Linux bridge agent | cloudvirt1024      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 5bf8b31c-d752-4986-820e-c161d5fa70a9 | Linux bridge agent | cloudvirt1032      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 65f9d324-5126-4336-8f52-001cd0c9fdd1 | Linux bridge agent | cloudvirt1016      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 682e90e1-6457-4c24-b6f0-4f573d58710b | Linux bridge agent | cloudvirt1042      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 70c898a1-f881-4474-bbbb-10fb8d060d1e | Linux bridge agent | cloudvirt-wdqs1002 |                   | :-)   | True           | neutron-linuxbridge-agent |
| 778b2757-088c-4742-8d76-22c3e6d9a306 | Linux bridge agent | cloudvirt1017      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 7bbc9374-e1b0-4c80-ae69-62e6e90af281 | Linux bridge agent | cloudvirt1035      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 813c9efb-b2af-4063-a8b3-8e8f1976977c | Linux bridge agent | cloudvirt1025      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 8e8cb1fd-62b6-475f-ae2f-2b85355fd3e3 | Linux bridge agent | cloudvirt1030      |                   | :-)   | True           | neutron-linuxbridge-agent |
| 8fb2e553-03d7-4d3a-a01d-d690a61255ce | Linux bridge agent | cloudvirt-wdqs1001 |                   | :-)   | True           | neutron-linuxbridge-agent |
| 9238d8cd-02cb-4d1f-a629-3b3d8cc9c1bf | Linux bridge agent | cloudnet1003       |                   | :-)   | True           | neutron-linuxbridge-agent |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | L3 agent           | cloudnet1004       | nova              | :-)   | True           | neutron-l3-agent          |
| a3d2685e-f826-4666-993c-0ad304718d41 | Linux bridge agent | cloudvirt1026      |                   | :-)   | True           | neutron-linuxbridge-agent |
| aa714a6a-b44f-49b0-8a93-c16dc4626252 | Linux bridge agent | cloudvirt1043      |                   | :-)   | True           | neutron-linuxbridge-agent |
| ad3461d7-b79e-4279-921d-5a476e296767 | Linux bridge agent | cloudnet1004       |                   | :-)   | True           | neutron-linuxbridge-agent |
| af08efd1-9936-41dd-b04b-3b6d6039f9e5 | Linux bridge agent | cloudvirt1040      |                   | :-)   | True           | neutron-linuxbridge-agent |
| b0f1cdf2-8d03-4f7b-978c-201ecea69b84 | Linux bridge agent | cloudvirt1020      |                   | :-)   | True           | neutron-linuxbridge-agent |
| b296d716-d72f-479e-9977-5e06637ca89d | Linux bridge agent | cloudvirt1028      |                   | :-)   | True           | neutron-linuxbridge-agent |
| b2f9da63-2f16-4aa5-9400-ae708a733f91 | Linux bridge agent | cloudvirt1021      |                   | :-)   | True           | neutron-linuxbridge-agent |
| b77956b4-c836-4644-a490-1a16b6856323 | Linux bridge agent | cloudvirt-wdqs1003 |                   | :-)   | True           | neutron-linuxbridge-agent |
| c2403ab2-512a-4663-8f4c-3851940d3847 | Linux bridge agent | cloudvirt1018      |                   | :-)   | True           | neutron-linuxbridge-agent |
| cb07848f-d4a0-4d64-81c9-03359cd46ee7 | Linux bridge agent | cloudvirt1039      |                   | :-)   | True           | neutron-linuxbridge-agent |
| ce4f0afc-d0c0-411b-9faa-9e4f83c746b0 | Linux bridge agent | cloudvirt1036      |                   | :-)   | True           | neutron-linuxbridge-agent |
| d475e07d-52b3-476e-9a4f-e63b21e1075e | Metadata agent     | cloudnet1004       |                   | :-)   | True           | neutron-metadata-agent    |
| e349b624-d593-43cf-ac9a-59f473cfa5c6 | Metadata agent     | cloudnet1003       |                   | :-)   | True           | neutron-metadata-agent    |
| e382a233-e6a0-422e-9d2e-5651082783fc | Linux bridge agent | cloudvirt1022      |                   | :-)   | True           | neutron-linuxbridge-agent |
| e8e60646-0d34-4d3b-8d91-8881dd967382 | Linux bridge agent | cloudvirt1038      |                   | :-)   | True           | neutron-linuxbridge-agent |
| e9f219db-efdd-4157-9045-48316c61de5e | Linux bridge agent | cloudvirt1023      |                   | :-)   | True           | neutron-linuxbridge-agent |
| fc45a34d-d8a4-45fe-982d-5b4b7a8fcde1 | Linux bridge agent | cloudvirt1019      |                   | :-)   | True           | neutron-linuxbridge-agent |
+--------------------------------------+--------------------+--------------------+-------------------+-------+----------------+---------------------------+
aborrero renamed this task from cloudcontrol1003 - Check for VMs leaked by the nova-fullstack test to openstack: galera DB problem caused problems in the neutron API, causing dead neutron agents that led to VMs leaked by the nova-fullstack test.Feb 23 2022, 10:25 AM
aborrero renamed this task from openstack: galera DB problem caused problems in the neutron API, causing dead neutron agents that led to VMs leaked by the nova-fullstack test to cloudcontrol1003 - Check for VMs leaked by the nova-fullstack test.Feb 23 2022, 10:43 AM

This seems running and working right now, will close this task.

Change 771569 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] openstack: add neutron-rpc-server and ensure up

https://gerrit.wikimedia.org/r/771569

Change 771569 merged by David Caro:

[operations/puppet@production] openstack: add neutron-rpc-server and ensure up

https://gerrit.wikimedia.org/r/771569