Maniphest T282087

Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• taavi
	May 6 2021, 7:35 AM

Description

Kubernetes workers require a LVM volume to be mounted at /var/lib/docker. Support for Cinder or a custom flavor like what was added for grid workers in T272114 needs to be added.

(this task is not to be confused with T275555, which asks for Cinder as a replacement for NFS as file storage for individual tools)

Details

	Subject	Repo	Branch	Lines +/-
	toolforge kubernetes: change class for the new cinder environment	operations/puppet	production	+10 -4

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T362869 [k8s,infra] Upgrade Toolforge to Uwubernetes (1.30)
Open	None	T362868 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29
Open	None	T362867 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28
Open	None	T359641 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27
Open	None	T327025 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.26
Open	None	T316107 [infra,k8s] Upgrade Toolforge Kubernetes to version 1.25
Resolved	aborrero	T307651 Upgrade Toolforge Kubernetes to version 1.24
Resolved	• taavi	T298005 Upgrade Toolforge Kubernetes to version 1.23
Resolved	• taavi	T286856 Upgrade Toolforge Kubernetes to latest 1.22
Resolved	rook	T308172 Upgrade PAWS to Kubernetes 1.21
Resolved	• taavi	T282942 Upgrade Toolforge Kubernetes to latest 1.21
Resolved	rook	T291913 Upgrade PAWS kubernetes to 1.20
Resolved	rook	T280402 Upgrade Toolforge Kubernetes to latest 1.20
Resolved	• taavi	T280340 Upgrade Toolforge Kubernetes to latest 1.19
Resolved	• taavi	T280299 Upgrade Toolforge Kubernetes to latest 1.18
Resolved	• taavi	T280302 Upgrade PAWS Kubernetes to the latest 1.18 release
		Restricted Task
Resolved	• Bstorm	T246122 Upgrade the Toolforge Kubernetes cluster to v1.16
Resolved	• Bstorm	T263284 Upgrade Toolforge K8s to 1.17
Resolved	• taavi	T264221 Upgrade the nginx ingress controller in Toolforge (and likely PAWS)
Resolved	• Bstorm	T282087 Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes

Event Timeline

• taavi created this task.May 6 2021, 7:35 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 6 2021, 7:35 AM

• taavi added a parent task: T264221: Upgrade the nginx ingress controller in Toolforge (and likely PAWS).May 6 2021, 7:35 AM

• taavi updated the task description. (Show Details)May 6 2021, 8:10 AM

Does that need to be cinder or should it be like the grid nodes with ephemeral storage. I, personally, think the latter. Otherwise, rebuilding nodes will be a serious pain.

So if we have an appropriate flavor to use (and the grid flavor might work), we can use the same approach as https://gerrit.wikimedia.org/r/c/operations/puppet/+/672456/

• Bstorm renamed this task from Support Cinder worker nodes on Toolforge Kubernetes to Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes.May 6 2021, 6:04 PM

• Bstorm updated the task description. (Show Details)

It looks like the grid nodes use the flavor g3.cores4.ram8.disk20.swap24.ephem20. K8s nodes actually cannot have swap, so that isn't quite right, for sure.

Got it. This is made for k8s nodes: g3.cores8.ram16.disk20.ephem140 That roughly matches the existing larger nodes. Using those with the cinder class should Just Work.

Ingress and control currently use a smaller flavor (g2.cores2.ram4.disk40), we need an equivalent flavor for those. Is 140G ephemeral storage really needed? For example tools-k8s-worker-68 is currently using 40G in /docker, we could use smaller disks to preserve space.

We could, but there isn't much need to. It doesn't actually consume that space on the ceph cluster. This is all thin-provisioned. The docker images we currently use are extremely large (for docker images). I'd rather keep plenty of room.

Control and ingress don't need anywhere near this much space. The control nodes don't have a separate /var/lib/docker mount, and the ingress nodes arguably don't need it either. They don't need more than 2GB of space for docker, and that is fine on the root disk.

I see ingress nodes are using the role::wmcs::toolforge::k8s::worker class in puppet. I can factor out the docker volume and set a different ingress controller class for those that just uses a non-cinder setup. Ingress and control just don't need the extra disk, I don't think.

Change 685936 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] toolforge kubernetes: change class for the new cinder environment

https://gerrit.wikimedia.org/r/685936

gerritbot added a project: Patch-For-Review.May 6 2021, 9:39 PM

That patch should make this just fine, and new nodes should be easy to deploy when required after it is merged. When we upgrade the ingress nodes, it'd make sense to set profile::wmcs::kubeadm::docker_vol: false on the appropriate prefix. I don't think that will unmount it retroactively on existing ingress nodes since it is usually hard to make puppet unmount things, right @Andrew ?

We'll likely try that in toolsbeta first anyway.

Change 685936 merged by Bstorm:

[operations/puppet@production] toolforge kubernetes: change class for the new cinder environment

https://gerrit.wikimedia.org/r/685936

Maintenance_bot removed a project: Patch-For-Review.May 7 2021, 7:10 PM

Ok updated the docs at https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Deploying for ingress and standard worker nodes. The puppet patch is deployed. This should be good to go. We just need to use the correct flavors, etc. No actual cinder storage is required.

So tools ingress nodes are failing puppet since their ephemeral disk is only 20G while the patch has min_gb => 40,. Is there any reason not to remove that?

the min/max settings are needed for puppet to tell the difference between different volumes (their actual /dev paths are indeterminate). That said, you're free to change the min/max numbers to agree with whatever the sizes are of the flavor you're using.

I forgot to add profile::wmcs::kubeadm::docker_vol: false on ingress nodes. Please do not increase disk for them.

That fixed it, as was intended when I submitted the patch. Sorry I forgot to add the hiera.

Mentioned in SAL (#wikimedia-cloud) [2021-05-11T09:17:52Z] <Majavah> set profile::wmcs::kubeadm::docker_vol: false on ingress nodes T282087

Support cinder or expanded ephemeral disk worker nodes on Toolforge KubernetesClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Support cinder or expanded ephemeral disk worker nodes on Toolforge Kubernetes
Closed, ResolvedPublic
Actions

Related Objects
Search...