Page MenuHomePhabricator

Investigate nova metadata issues that appeared during the Rocky upgrade
Closed, ResolvedPublic

Description

After upgrading cloudcontrols and cloudnets to Rocky in eqiad1, the nova metadata service stopped working. This was narrowed down to the neutron metadata agent failing to create an IP tables rule that directed metadata requests.

We created those rules by hand, and things are working now. It's possible that creation of those ports was blocked by a rabbitmq issue that @jeh has already fixed; nevertheless we need to test more to ensure that that rule is being properly maintained by neutron.

Event Timeline

Change 589423 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Neutron.conf: change rpc_timeout to 30

https://gerrit.wikimedia.org/r/589423

Change 589423 merged by Andrew Bogott:
[operations/puppet@production] Neutron.conf: change rpc_timeout to 30

https://gerrit.wikimedia.org/r/589423

JHedden added a subscriber: JHedden.

In the past the agents were going offline due to missed rabbitMQ heartbeat messages. Consider creating a prometheus exporter to monitor the OpenStack nova and neutron agents to watch for up/down state.