Page MenuHomePhabricator

Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes
Closed, ResolvedPublic

Description

Kubernetes workers require a LVM volume to be mounted at /var/lib/docker. Support for Cinder or a custom flavor like what was added for grid workers in T272114 needs to be added.

(this task is not to be confused with T275555, which asks for Cinder as a replacement for NFS as file storage for individual tools)

Related Objects

Event Timeline

Does that need to be cinder or should it be like the grid nodes with ephemeral storage. I, personally, think the latter. Otherwise, rebuilding nodes will be a serious pain.

So if we have an appropriate flavor to use (and the grid flavor might work), we can use the same approach as https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456/

Bstorm renamed this task from Support Cinder worker nodes on Toolforge Kubernetes to Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes.May 6 2021, 6:04 PM
Bstorm updated the task description. (Show Details)

It looks like the grid nodes use the flavor g3.cores4.ram8.disk20.swap24.ephem20. K8s nodes actually cannot have swap, so that isn't quite right, for sure.

Got it. This is made for k8s nodes: g3.cores8.ram16.disk20.ephem140 That roughly matches the existing larger nodes. Using those with the cinder class should Just Work.

Ingress and control currently use a smaller flavor (g2.cores2.ram4.disk40), we need an equivalent flavor for those. Is 140G ephemeral storage really needed? For example tools-k8s-worker-68 is currently using 40G in /docker, we could use smaller disks to preserve space.

We could, but there isn't much need to. It doesn't actually consume that space on the ceph cluster. This is all thin-provisioned. The docker images we currently use are extremely large (for docker images). I'd rather keep plenty of room.

Control and ingress don't need anywhere near this much space. The control nodes don't have a separate /var/lib/docker mount, and the ingress nodes arguably don't need it either. They don't need more than 2GB of space for docker, and that is fine on the root disk.

I see ingress nodes are using the role::wmcs::toolforge::k8s::worker class in puppet. I can factor out the docker volume and set a different ingress controller class for those that just uses a non-cinder setup. Ingress and control just don't need the extra disk, I don't think.

Change 685936 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] toolforge kubernetes: change class for the new cinder environment

https://gerrit.wikimedia.org/r/685936

That patch should make this just fine, and new nodes should be easy to deploy when required after it is merged. When we upgrade the ingress nodes, it'd make sense to set profile::wmcs::kubeadm::docker_vol: false on the appropriate prefix. I don't think that will unmount it retroactively on existing ingress nodes since it is usually hard to make puppet unmount things, right @Andrew ?

We'll likely try that in toolsbeta first anyway.

Change 685936 merged by Bstorm:

[operations/puppet@production] toolforge kubernetes: change class for the new cinder environment

https://gerrit.wikimedia.org/r/685936

Bstorm claimed this task.

Ok updated the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying for ingress and standard worker nodes. The puppet patch is deployed. This should be good to go. We just need to use the correct flavors, etc. No actual cinder storage is required.

So tools ingress nodes are failing puppet since their ephemeral disk is only 20G while the patch has min_gb => 40,. Is there any reason not to remove that?

the min/max settings are needed for puppet to tell the difference between different volumes (their actual /dev paths are indeterminate). That said, you're free to change the min/max numbers to agree with whatever the sizes are of the flavor you're using.

I forgot to add profile::wmcs::kubeadm::docker_vol: false on ingress nodes. Please do not increase disk for them.

That fixed it, as was intended when I submitted the patch. Sorry I forgot to add the hiera.

Mentioned in SAL (#wikimedia-cloud) [2021-05-11T09:17:52Z] <Majavah> set profile::wmcs::kubeadm::docker_vol: false on ingress nodes T282087