Page MenuHomePhabricator

Add cluster-awareness to nfs-exportd
Closed, ResolvedPublic

Description

Since we abandoned bind mounts on the NFS clusters per https://gerrit.wikimedia.org/r/c/operations/puppet/+/571821, nfs-exportd throws non-zero statuses on stretch+ on DRBD secondary nodes (which makes perfect sense). Why it didn't on Jessie is the real mystery.

As a bandaid fix, we switched from subprocess.check_call() to subprocess.call(), but that seems like a good way to mask serious errors in the future. Adding some cluster awareness like that used by maintain-dbusers should fix things in a more sensible way. While we are at it, it wouldn't hurt to set this to only run exportfs when changes are made. Re-running exportfs every 5 minutes could not possibly be good for the system's performance despite the fact that we've done it for years.

Event Timeline

Bstorm triaged this task as Medium priority.May 21 2020, 10:08 PM
Bstorm created this task.
Bstorm moved this task from Backlog to Shared Storage on the Data-Services board.
Bstorm moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.
Bstorm claimed this task.Tue, Jun 16, 5:23 PM
Bstorm moved this task from Soon! to Doing on the cloud-services-team (Kanban) board.

Change 606543 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] cloud nfs: only run nfs-exportd on the current active node

https://gerrit.wikimedia.org/r/606543

Mentioned in SAL (#wikimedia-operations) [2020-06-22T22:39:38Z] <bstorm_> downtimed labstore1005 to prevent an alert during puppet merge T253353

Change 606543 merged by Bstorm:
[operations/puppet@production] cloud nfs: only run nfs-exportd on the current active node

https://gerrit.wikimedia.org/r/606543

Bstorm closed this task as Resolved.Mon, Jun 22, 10:55 PM