While working on T404584: [tools,nfs,infra] Address tools NFS getting stuck with processes in D state we have run into an issue where (some?) cinder NFS volumes get in status 'reserved' after being detached by the wmcs.nfs.migrate_service cookbook (or the equivalent wmcs-openstack server volume remove <server id> <volume id>.
Further testing and debugging and investigation is needed at this time, first of all to understand the impact better.
We did take a closer look yesterday and for example this is what nova has to say about the operation:
2025-10-07 13:39:58.308 3211222 WARNING nova.virt.libvirt.driver [None req-0e2bde0e-9de3-4e99-a763-06dc81d1b637 novaadmin admin - - default default] Failed to detach device sdb from instance 19c9ecd1-6fb2-4a2d-954a-c1dc6c956034 from the persistent domain config. Libvirt did not report any error but the device is still in the config.
Current reproducer in toolsbeta:
Stop puppet and nfs-server on current nfs server.
NB THIS CAUSES TOOLSBETA NFS OUTAGE.
root@toolsbeta-nfs-4:~# disable-puppet T406688 root@toolsbeta-nfs-4:~# systemctl stop nfs-server root@toolsbeta-nfs-4:~# umount /srv/toolsbeta
Detach volume from host and observe it getting into state 'reserved'
root@cloudcontrol1006:~# wmcs-openstack server remove volume $(wmcs-server-id toolsbeta-nfs-4.toolsbeta.eqiad1.wikimedia.cloud ) 648504db-18c2-4cee-b731-567dcb4dadf6 root@cloudcontrol1006:~# wmcs-openstack volume show 648504db-18c2-4cee-b731-567dcb4dadf6
To put things back, first set the volume available and then reattach:
root@cloudcontrol1006:~# wmcs-openstack volume set --state available 648504db-18c2-4cee-b731-567dcb4dadf6 root@cloudcontrol1006:~# wmcs-openstack server add volume $(wmcs-server-id toolsbeta-nfs-4.toolsbeta.eqiad1.wikimedia.cloud ) 648504db-18c2-4cee-b731-567dcb4dadf6
Then get puppet going again:
root@toolsbeta-nfs-4:~# run-puppet-agent --force