We have experienced an outage when puppet restart neutron-openvswitch-agent in the fleet.
Description
Details
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | • aborrero | T380882 openstack network problems (November 2024) | |||
| Resolved | • aborrero | T380972 openstack: prevent puppet from restarting neutron-openvswitch-agent |
Event Timeline
Change #1098498 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):
[operations/puppet@production] openstack: neutron-openvswitch-agent: prevent puppet from restarting the service
Change #1098498 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] openstack: neutron-openvswitch-agent: prevent puppet from restarting the service
Is there any theory about why restarting openvswitch-agent is more delicate than restarting the old linuxbridge agent?
I'm in favor of avoiding outages, but because the agent runs in many places (cloudvirts), decoupling it from puppet can result in agent state being out of sync with config which also seems bad.
my current theory is that the linuxbridge agent was stateless, whereas openvswitch is stateful.