Page MenuHomePhabricator

[NFS] Reduce or eliminate bare-metal NFS servers
Open, HighPublic

Description

T290602 has inspired some frantic conversation about the future of our NFS servers. The current plan is:

Decisions taken:

  • we will use regular NFS VMs, one per share
  • all in the cloudinfra-nfs VPS project (to be created)
  • volume backups will happen using cinder-backup service, on cloudbackup2001 (codfw datacenter)
  • will automate the provisioning of the NFS VMs using cookbooks
  • will do a first run of the migration process and iterate on that

Done:

  • Tested creating NFS VMs using cinder volumes manually with puppet config and tested mounting it on toolsbeta

Doing:

  • Setup cinder-backups service on cloudbackup2001 an link it to the eqiad cluster
  • Automate with cookbooks the creation of the NFS VMs and volumes
  • Do a test run of the migration procedure with one of the less busy shares (scratch/misc)

To define:

  • How/what to monitor/alert on for this system
  • Iterate on the migration procedure on how to migrate the rest of the shares
  • Add a script to trigger the volume backups on clouddb on a weekly basis

Notes:
CephFS use is not in our immediate plans because that opens complicated networking/DC questions that we're not ready to think about

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenAndrew
Resolvedaborrero
ResolvedAndrew
DuplicateNone
OpenAndrew
Resolvedaborrero
DuplicateNone
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
OpenNone
OpenNone
OpenNone
ResolvedAndrew
OpenAndrew
Resolvednskaggs
OpenAndrew
OpenAndrew
ResolvedAndrew
ResolvedAndrew
InvalidAndrew
ResolvedAndrew
DeclinedAndrew
ResolvedAndrew
DeclinedAndrew
ResolvedAndrew
DeclinedAndrew
ResolvedAndrew
Resolvedjsn.sherman
ResolvedAndrew
OpenAndrew
OpenAndrew

Event Timeline

Notes from our just-completed meeting (for future reference):

Our NFS servers pools are:

  • Tools Using 6 of 8TB
  • Maps Using 5 of 8TB
  • Other projects not tools or maps Using 2 of 5 TB
  • Scratch using 2 of 4 TB

Total: Using 15 out of allocated 25 TB
(Ceph currently has ~35 available TB)

  • Dumps (read-only and SO BIG that we aren't talking about this today) (dumps is worked on by Ariel and also some analytics/Data Engineering folks)
  • Status Quo
    • Pros: It's the status quo, using DRBD
    • Cons: Clunky, requires domain-specific knowledge, Violates network separation rules
  • Status Quo but with rsync backup instead of drbd
    • Pros: not needing to understand drbd
    • Cons: Potential data loss between backups; violates network separation, clunky
  • Existing server model but on VMs (no openstack manila)
    • Pros: fewer kinds of hardware, fewer kinds of networks, we could start doing this today!
    • Cons: possible network congestion, heavy Ceph usage, possibly difficult migration
    • The Backup Plan *****
  • Some different server model on VMs (e.g. more servers but no automatic provisioning)
    • Pros: roughly the same as above but possibly with better load/risk distribution, we could start doing this today!
    • Cons: roughly the same as above
    • The WINNER ******
  • Proper openstack-native share management via Manila
    • Pros: builds VMs with nova, cinder volumes, etc. More or less automates the VM model? Also supports quotas. Could flip later to cephfs easily
    • Cons: Tools would still be it's own project (WMCS would have to manage); less flexibility to configure NFS since Manila will want us to treat it as a black box
  • CephFS
    • Pros: quotas?, supported by Manila
    • Cons: new/unknown, requires network proxy. How can you authenicate? (Ceph is in the production realm)

Open questions:

  • Do we want to put NFS data into Ceph?
    • Ceph is the only scalable performant solution.
    • What about backups? Could use backy2, reusing existing backup servers and jobs.. Not everything can/will be 100% backed up.
  • What about HA?
    • DRBD'd NFS servers are in the same rack, given the direct cable. Limits physical setup
    • Don't auto failover as-is
  • What about network traffic?
    • Think carefully about network setup and flows between racks
    • One reason NFS in VM's won't work is because of bandwidth constraints / concerns; at least as we build VM's now
    • IE, create a dedicated cloud-virt to host NFS VM's, etc
  • Which of those scenarios requires us to re-learn all of the performance throttling that we've learned with our existing setup?
  • DON'T DO NFS soft-mounts. Once they time-out, they wont recover and you need to reboot the VM.
  • How can we further seperate the tools share? Making seperate shares for quota / performance reasons

Mentioned in SAL (#wikimedia-cloud) [2021-09-20T21:57:03Z] <andrewbogott> moving cloudvirt1043 into the 'nfs' aggregate for T291405

dcaro renamed this task from Reduce or eliminate bare-metal NFS servers to [NFS] Reduce or eliminate bare-metal NFS servers.Oct 19 2021, 3:43 PM
dcaro triaged this task as High priority.Oct 19 2021, 3:50 PM
dcaro updated the task description. (Show Details)
dcaro added subscribers: dcaro, aborrero.

Change 753100 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] profile::wmcs::nfs::standalone: bind service IP to VM

https://gerrit.wikimedia.org/r/753100

Change 753100 merged by Andrew Bogott:

[operations/puppet@production] profile::wmcs::nfs::standalone: bind service IP to VM

https://gerrit.wikimedia.org/r/753100

Change 754043 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-vps nfsclient: switch to using the VM-hosted scratch NFS server

https://gerrit.wikimedia.org/r/754043

Change 754043 merged by Andrew Bogott:

[operations/puppet@production] cloud-vps nfsclient: switch to using the VM-hosted scratch NFS server

https://gerrit.wikimedia.org/r/754043

Change 758998 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] nfs-mounts.yaml.erb: temporarily mount 'maps' in cloudinfra-nfs

https://gerrit.wikimedia.org/r/758998

Change 758998 merged by Andrew Bogott:

[operations/puppet@production] nfs-mounts.yaml.erb: temporarily mount 'maps' in cloudinfra-nfs

https://gerrit.wikimedia.org/r/758998

Change 761438 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloudnfs: Add a hiera key to switch scratch hosting on or off

https://gerrit.wikimedia.org/r/761438

Change 761438 merged by Andrew Bogott:

[operations/puppet@production] cloudnfs: Add a hiera key to switch scratch hosting on or off

https://gerrit.wikimedia.org/r/761438

Change 761981 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] profile::wmcs::nfs::standalone: keep the nfs service running

https://gerrit.wikimedia.org/r/761981

Change 761981 merged by Andrew Bogott:

[operations/puppet@production] profile::wmcs::nfs::standalone: keep the nfs service running

https://gerrit.wikimedia.org/r/761981

Change 773819 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Move cloudstore1008/1009 to role::spare

https://gerrit.wikimedia.org/r/773819

Change 773819 merged by Andrew Bogott:

[operations/puppet@production] Move cloudstore1008/1009 to role::spare

https://gerrit.wikimedia.org/r/773819

Change 779446 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs: Remove unused role wmcs::nfs::secondary

https://gerrit.wikimedia.org/r/779446

Change 779446 merged by David Caro:

[operations/puppet@production] wmcs: Remove unused role wmcs::nfs::secondary

https://gerrit.wikimedia.org/r/779446