Steps to replicate the issue (include links if applicable):
- Depool one of the two clouddumpsXXXX hosts as described in Portal:Data_Services/Admin/Dumps#Cloud_VPS_NFS
- Shut down the depooled host
What happens?:
- Puppet starts failing across Cloud VPS with the error below
- Toolforge starts misbehaving (see graphs below)
What should have happened instead?:
Cloud VPS and Toolforge should continue to work fine, ignoring the inactive clouddumps host.
Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):
Other information
2025-04-08T14:51:12.007185+00:00 tools-k8s-worker-nfs-75 puppet-agent[3770568]: '/usr/bin/timeout -k 5s 20s /bin/mkdir -p /mnt/nfs/dumps-clouddumps1001.wikimedia.org' returned 124 instead of one of [0] 2025-04-08T14:51:12.013842+00:00 tools-k8s-worker-nfs-75 puppet-agent[3770568]: (/Stage[main]/Profile::Wmcs::Nfsclient/Labstore::Nfs_mount[clouddumps1001.wikimedia.org]/Exec[create-/mnt/nfs/dumps-clouddumps1001.wikimedia.org]/returns) change from 'notrun' to ['0'] failed: '/usr/bin/timeout -k 5s 20s /bin/mkdir -p /mnt/nfs/dumps-clouddumps1001.wikimedia.org' returned 124 instead of one of [0] (corrective) 2025-04-08T14:51:12.016681+00:00 tools-k8s-worker-nfs-75 puppet-agent[3770568]: (/Stage[main]/Profile::Wmcs::Nfsclient/Labstore::Nfs_mount[clouddumps1001.wikimedia.org]/Exec[ensure-nfs-clouddumps1001.wikimedia.org]) Dependency Exec[create-/mnt/nfs/dumps-clouddumps1001.wikimedia.org] has failures: true 2025-04-08T14:51:12.016889+00:00 tools-k8s-worker-nfs-75 puppet-agent[3770568]: (/Stage[main]/Profile::Wmcs::Nfsclient/Labstore::Nfs_mount[clouddumps1001.wikimedia.org]/Exec[ensure-nfs-clouddumps1001.wikimedia.org]) Skipping because of failed dependencies
Toolforge graphs (clouddumps1001 was down approximately from 14:10 UTC to 15:10 UTC):





