Try deploying OVS in codfw1dev in parallel to the current setup to see if a migration without a full Openstack redeployment is even possible.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Andrew | T323086 upgrade cloud-vps openstack to Openstack version 'Zed' | |||
Open | taavi | T326373 Migrate Cloud VPS to Neutron Open vSwitch agent | |||
Open | taavi | T358761 Deploy OVS test setup in codfw1dev | |||
Open | None | T358868 Use BGP to announce VM ranges from cloudnet to cloudgw |
Event Timeline
Change 1007900 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] Add new role for OVS cloudnet
Change 1007901 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] Add some new networks for WMCS OVS testing
Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudnet2007-dev.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudnet2008-dev.codfw.wmnet with OS bookworm
Change 1007900 merged by Majavah:
[operations/puppet@production] Add new role for OVS cloudnet
Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudnet2008-dev.codfw.wmnet with OS bookworm completed:
- cloudnet2008-dev (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202403041117_taavi_146478_cloudnet2008-dev.out, asking the operator what to do
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403041126_taavi_146478_cloudnet2008-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudnet2007-dev.codfw.wmnet with OS bookworm completed:
- cloudnet2007-dev (WARN)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202403041114_taavi_146402_cloudnet2007-dev.out, asking the operator what to do
- First Puppet run failed and logged in /var/log/spicerack/sre/hosts/reimage/202403041125_taavi_146402_cloudnet2007-dev.out, asking the operator what to do
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403041125_taavi_146402_cloudnet2007-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
Change 1008422 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] O:wmcs: codfw1dev: net_ovs: add base neutron config
Change 1008422 merged by Majavah:
[operations/puppet@production] O:wmcs: codfw1dev: net_ovs: add base neutron config
Change 1008462 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: neutron: add API support for OVS
Change 1008463 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: neutron: first attempt of installing ovs-agent
Change 1008462 merged by Majavah:
[operations/puppet@production] openstack: neutron: add API support for OVS
Change 1008463 merged by Majavah:
[operations/puppet@production] openstack: neutron: first attempt of installing ovs-agent
Change 1009496 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:openstack: neutron: fix VLAN names on OVS test hosts
Change 1007901 merged by Majavah:
[operations/puppet@production] Add some new networks for WMCS OVS testing
Change 1009496 merged by Majavah:
[operations/puppet@production] P:openstack: neutron: fix VLAN names on OVS test hosts
Change 1009511 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] P:opesntack: nova: convert cloudvirt2001-dev to OVS agent
Change 1009511 merged by Majavah:
[operations/puppet@production] P:openstack: nova: convert cloudvirt2001-dev to OVS agent
Cookbook cookbooks.sre.hosts.reimage was started by taavi@cumin1002 for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
Cookbook cookbooks.sre.hosts.reimage started by taavi@cumin1002 for host cloudvirt2001-dev.codfw.wmnet with OS bookworm completed:
- cloudvirt2001-dev (PASS)
- Downtimed on Icinga/Alertmanager
- Disabled Puppet
- Removed from Puppet and PuppetDB if present and deleted any certificates
- Removed from Debmonitor if present
- Forced PXE for next reboot
- Host rebooted via IPMI
- Host up (Debian installer)
- Add puppet_version metadata to Debian installer
- Checked BIOS boot parameters are back to normal
- Host up (new fresh bookworm OS)
- Generated Puppet certificate
- Signed new Puppet certificate
- Run Puppet in NOOP mode to populate exported resources in PuppetDB
- Found Nagios_host resource for this host in PuppetDB
- Downtimed the new host on Icinga/Alertmanager
- Removed previous downtime on Alertmanager (old OS)
- First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202403080940_taavi_945517_cloudvirt2001-dev.out
- configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
- Rebooted
- Automatic Puppet run was successful
- Forced a re-check of all Icinga services for the host
- Icinga status is optimal
- Icinga downtime removed
- Updated Netbox data from PuppetDB
$ sudo wmcs-openstack network create --project admin --share --provider-network-type vxlan lan-flat-cloudinstances3 $ sudo wmcs-openstack subnet create --network lan-flat-cloudinstances3 --subnet-range 172.16.129.0/24 --gateway 172.16.129.1 --dns-nameserver 172.20.254.1 cloud-instances-flat3-codfw-v4 # unset maintenance $ sudo wmcs-openstack server create --os-compute-api-version 2.74 --os-project-id taavitestproject --flavor g3.cores1.ram2.disk20 --image debian-12.0-bookworm --security-group 4c29a64f-b883-4622-893c-eb3fd78b0b7f --nic net-id=e40a1c9f-cc09-4751-a6b8-0469a52318b7 --host cloudvirt2001-dev taavi-ovs-test # set maintenance
Now the instance is failing to create with:
2024-03-08 10:38:24.213 1331 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent ; Stdout: ; Stderr: iptables-restore v1.8.9 (nf_tables): interface name `105c0477-6f00-4b3d-8749-795a34c5f9c4' must be shorter than IFNAMSIZ (15)
Note: The UUID in the iptables error is the Neutron port UUID. So presumably that's not being mapped to the actual interface name somewhere in the Neutron code.
The issue above is still persisting on Bobcat. Here's a log of an instance creating where that happened: P60929
Change #1021484 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: neutron: Fix firewall driver with openvswitch
Change #1021484 merged by Majavah:
[operations/puppet@production] openstack: neutron: Fix firewall driver with openvswitch
The firewall issue was fixed by the above patch setting the firewall driver to the same value on all config files. I can now create an instance on cloudvirt2001-dev with the command at P60933 that has an interface on an OVS provided network.
Next up:
- Add a DHCP agent to the OVS provider network
- Move a second cloudvirt (2002-dev, most likely) to the new setup
- Set up a second VM on that, and see if they can talk to each other
After that start looking at outbound connectivity from an OVS backed network, and also check if the OVS agent can talk to the current VLAN-backed network or whether each cloudvirt will strictly have to use one or the other.
Change #1021867 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] O:wmcs: codfw1dev: net_ovs: install dhcp and metadata agents
Change #1021867 merged by Majavah:
[operations/puppet@production] O:wmcs: codfw1dev: net_ovs: install dhcp and metadata agents
Change #1021894 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: neutron: set dhcp interface driver correctly
Change #1021894 merged by Majavah:
[operations/puppet@production] openstack: neutron: set dhcp interface driver correctly
Change #1021968 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] openstack: neutron: Connect OVS agents to provider networks
Change #1021968 merged by Majavah:
[operations/puppet@production] openstack: neutron: Connect OVS agents to provider networks
Change #1023384 had a related patch set uploaded (by Majavah; author: Majavah):
[operations/puppet@production] hieradata: move cloudvirt2002-dev to OVS agent
I've been looking at this error recently:
Apr 25 12:51:08 cloudvirt2001-dev nova-compute[2572868]: 2024-04-25 12:51:08.870 2572868 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: {"details":"cannot delete QoS row 4f1bdbb2-2063-4789-9603-b53982670743 because of 1 remaining reference(s)","error":"referential integrity violation"}
Change #1023384 merged by Majavah:
[operations/puppet@production] hieradata: move cloudvirt2002-dev to OVS agent
To move an instance from linuxbridge to OVS, the following UPDATE needs to be manually run on the database:
mysql:root@localhost [neutron]> UPDATE ml2_port_bindings SET vif_type = 'ovs' WHERE port_id = '<port id>';