Page MenuHomePhabricator

Virtualize NFS servers used exclusively by Cloud VPS tenants
Closed, DeclinedPublic

Description

In the interest of preserving history, I'm resurrecting this task and proposing at least part of the design.

NFS services provided by VMs on Ceph using an attached volume (cinder). The VMs would implement quotas per project and would have likely implement the current server layout of labstore1004/5. They would need to operate on quiet or dedicated-ish hardware and would merit some stress testing before migration of data and mounts. If cinder isn't used to provide Kubernetes volumes directly, such VMs could also provide that via NFS.

Original task description:

Consider virtualizing NFS servers by converting labstore servers into cloudvirt servers with a single giant VM instance running on them. This will bring the NFS servers themselves into the internal 172.16.x.x address space and increase isolation from Wikimedia production networks and servers.

  • labstore1004 & labstore1005
  • cloudstore1008 & cloudstore1009 (which are the planned replacements for labstore1003)

Event Timeline

bd808 triaged this task as Medium priority.Feb 18 2019, 4:43 PM
bd808 created this task.

We need to be careful with huge QCOW2 files because moving them around will be really painful.

This will not be a problem once we have networked block storage, as the NFS servers would be just acting as app servers. In a way, networked block storage is a blocker for this task.

There is also a question about network throughput with the hypervisor's NIC being used to retrieve data from the distributed storage node and send it out to the NFS client. A dedicated backend network could alleviate these issues.

               A                    B
        +--------------+    +---------------+
        |              |    |               |
+-------v-----+    +---+----v-----+    +----+---------+
|             |    |              |    |              |
|  Ceph Node  |    |  NFS Server  |    |  NFS Client  |
|             |    |              |    |              |
+-------+-----+    +---^----+-----+    +----^---------+
        |              |    |               |
        +--------------+    +---------------+
               C                    D

NFS Server: A,B,C,D are flowing through a single 10GbE NIC in this diagram, potentially offering 25% in each direction. That's for a single VM, the hypervisor is likely to be running others.

It seems this ticket should be closed in light of the Ceph goal, right?

If listed as blocked, it probably would want to be totally rewritten. Probably closed lol.

My thought process when writing this was not about using virtualization to turn our pets into cattle. It was about network isolation benefits of putting the NFS servers into the 172.16 network. I agree that evacuating enormous QCOW2 files to another cloudvirt would be functionally impossible.

Converting the base hardware to a cloudvirt and using virtualization for network isolation is the same thing that we are in the process of doing right now for the ToolsDB, OpenStreetMaps, and WikiLabels databases. Maybe we should get more experience with those instances however before we rush into repeating the pattern for other services.

Opening up for discussion and consideration from that viewpoint.

I would like to give a reminder that we don't need to convert the hardware to a 'cloudvirt' server to have it available in the openstack instance network.
We could just hook an additional NIC to the 172.16 subnet/VLAN and then reserve that address in neutron for that concrete NIC. This was mentioned already somewhere by some of you, and just refreshing the idea here.

Quick and dirty diagram:

image.png (518×1 px, 69 KB)

I would like to give a reminder that we don't need to convert the hardware to a 'cloudvirt' server to have it available in the openstack instance network.
We could just hook an additional NIC to the 172.16 subnet/VLAN and then reserve that address in neutron for that concrete NIC. This was mentioned already somewhere by some of you, and just refreshing the idea here.

Great point @aborrero. One question to consider if we took this approach: would having a storage server (NFS or otherwise) attached to both the production 10.x network and the cloud tenant 172.16.x network actually provide any isolation or protection to the prod network from possible attacks originating from the cloud tenant network?

That's an interesting thought. Without digging into the security implications here, this would simply change where the management of the rules lies. I'm not sure it actually would be a gain for isolation. I think it would be a gain for convenience in a way that is perhaps not actually good.

If we consider the worst case, which is an attacker gaining control on the storage server proc, in the cloudvirt case there is still a hypervisor layer between the attacker and production network.
We could do a similar thing and isolate the storage server daemon/proc in the shared network case by using namespaces. That namespace layer would be the only thing between the compromised process and production network.

Kind of typical security tradeoff.

WMCS meeting result:

  • Openstack ironic recommended by Chase instead of trying to isolate dual-homed ourselves
  • Continue to serve NFS as we do today, and pospone this indifinetly.
  • NFS in general: brooke suspect of nfs-exportd interaction with LDAP may briefly cut connection with sge nodes
Bstorm moved this task from Needs discussion to Inbox on the cloud-services-team (Kanban) board.

I'm reopening this task because, with the advent of ceph and what I now consider a pretty stable and good design of NFS storage, I think a VM-based shared-storage cluster might actually make sense, eventually.

Bstorm changed the task status from Open to Stalled.Aug 6 2020, 5:14 PM
Bstorm updated the task description. (Show Details)
Bstorm removed a subscriber: GTirloni.

This task cannot be worked on much until Ceph build-out is done and likely the implementation of cinder.

Ceph is here! T261132
Cinder will be piloted soon and this should be able to move forward when capacity is available.

We now have cinder. We still need additional space to do this, and we also need to be sure we actually want to do it. There are some complications in it.

It doesn't seem like we want to because the performance characteristics and network flows will not really be good. This would likely need to be scrapped in favor of doing something like hardware management in openstack.