Page MenuHomePhabricator

Tweak partman recipe for ML k8s workers
Closed, ResolvedPublic2 Estimated Story Points

Description

As we found a while back (T339231), the kubelet partition (/var/lib/kubelet) is too small by default (30G), and when using LLMs, we can actually run out of disk space there.

Tweak the k8s specific overlay config (modules/install_server/files/autoinstall/partman/custom/kubernetes-node-overlay.cfg) to increase the disk space for that partition so we don't have to manually tweak it after every (re)install.

Since this use case is unique to use, make a copy of the partman overlay mentioned above and wire it into our config (via modules/profile/data/profile/installserver/preseed.yaml)

120G should probably be enough.

Event Timeline

Change #1036195 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] install/partman: Tweak kubelet partition size for ML workers

https://gerrit.wikimedia.org/r/1036195

klausman set the point value for this task to 2.

Change #1036195 merged by Klausman:

[operations/puppet@production] install/partman: Tweak kubelet partition size for ML workers

https://gerrit.wikimedia.org/r/1036195

Change #1037042 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] install/partman: Separate out DSE cluster partman recipe from ML

https://gerrit.wikimedia.org/r/1037042

Change #1037042 merged by Klausman:

[operations/puppet@production] install/partman: Separate out DSE cluster partman recipe from ML

https://gerrit.wikimedia.org/r/1037042