Page MenuHomePhabricator

Cloud VPS: NFS servers: the current setup requires a puppet run after a reboot to get address right
Closed, ResolvedPublic

Description

We were bitten today by T347665: Multiple CloudVPS instances lost their ips (unreachable) and we discovered that the current Cloud VPS NFS setup in puppet requires a puppet run after a reboot to set up the IP addresses correctly.
If puppet is not run manually it can take up until 30 minutes for the server to run puppet itself and get fixed. This is unfortunate, because this increases outage surface.

In an ideal setup, a reboot would get the servers up and running and ready to serve clients.

Event Timeline

I can confirm this is still the case, Profile::Wmcs::Nfs::Standalone does a manual Exec[ip addr add]

My understanding is that the reason for this is the puppetization expecting ifupdown (with /etc/network/interfaces) but the Debian cloud images switched to netplan/systemd-networkd some releases ago.

Ok if we have systemd-networkd everywhere then we should be installing/removing drop-in files for networkd to pick up instead.

Change #1191326 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] interface: new define for additional IPs

https://gerrit.wikimedia.org/r/1191326

Change #1191327 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] wmcs: have additional IPs survive reboots

https://gerrit.wikimedia.org/r/1191327

Change #1191326 merged by Filippo Giunchedi:

[operations/puppet@production] interface: new define for additional IPs

https://gerrit.wikimedia.org/r/1191326

Change #1191327 merged by Filippo Giunchedi:

[operations/puppet@production] wmcs: have additional IPs survive reboots

https://gerrit.wikimedia.org/r/1191327

Change #1194613 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] interface: ensure as string not bare word

https://gerrit.wikimedia.org/r/1194613

Change #1194613 merged by Filippo Giunchedi:

[operations/puppet@production] interface: ensure as string not bare word

https://gerrit.wikimedia.org/r/1194613

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:27:02Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:32:00Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:35:21Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:40:13Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:43:39Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:47:41Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:51:00Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:52:06Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:53:21Z] <godog> very brief nfs interruption to wrap up T347681

Mentioned in SAL (#wikimedia-cloud) [2025-10-08T12:54:16Z] <godog> very brief nfs interruption to wrap up T347681

This is done; NFS servers in cloud vps will set their additional IP at boot without the need of an additional puppet run