Page MenuHomePhabricator

[toolforge k8s] Support Cinder volumes
Open, MediumPublic

Description

Could the Toolforge Kubernetes cluster allow attaching Cinder volumes? Then, tools would have a more modern storage option that’s (probably) faster and more reliable than NFS. It would also help to remove load from Wikimedia’s NFS servers.

Here’s some instructions that a Kubernetes admin could try. As an unprivileged toolforge user, I don’t think I can do this myself.
https://kubernetes.io/blog/2020/02/07/deploying-external-openstack-cloud-provider-with-kubeadm/

When I asked on the wikimedia-cloud IRC channel, @Andrew replied:

I wouldn't expect that to work, although I'm interested in hearing about what you find if you try.

Event Timeline

I don't think we have any historical tickets about this idea, mostly because they were fanciful until we actually had Cinder support in Cloud VPS's OpenStack deployment. I see this as a very likely component of T194332: [Epic] Make Toolforge a proper platform as a service with push-to-deploy and build packs and related Toolforge platform modernization efforts.

Note that Kubernetes can also directly mount volumes from Ceph RBD, so this wouldn’t necessarily have to be done via Cinder. If Kubernetes was directly mounting Ceph RBD, there would be one less layer to maintain. But I don’t know how well this would fit into Wikimedia’s production setup in terms of quota enforcement, key management, monitoring, etc. Here’s some pointers, in case you want to explore this. The example setup looks actually quite simple.

Ceph RBD to Kubernetes isn't something we'd want to support in the existing security model. By integrating the Openstack provider in Kubernetes, we can get cinder volumes "for free" as an available volume type, so it's not actually terribly hard except that it requires changes k8s a lot. To add RBD directly would require restructuring the authentication model of ceph to allow direct connections from Kubernetes (something we currently only allow from Openstack services, basically. Keeping the cinder layer on top is therefore both potentially convenient despite the abstraction and preserves the authentication model of Openstack that we use (which is rooted in LDAP, and is therefore compatible with the source of our Kubernetes credentials, which is also, ultimately, LDAP). Ceph is not managed by LDAP.

@Sascha, honestly, the idea might fit better with Wikimedia production's model than with Toolforge's. :)

I should also mention that we had considered the openstack provider back in T214513, but at the time we decided not to use it because we lacked Cinder, Barbican and LBaaS. Now we have Cinder, so that might suggest there's reason to do some testing.

Phamhi triaged this task as Medium priority.Mar 9 2021, 5:32 PM