We learned that the models pulled by the storage-initializer to /mnt/models are using space on the kubelet's disk partition (via k8s emptyDirs). The partition is currently small (~40G), so we should expand it on all nodes since we have some space left on the LVM physical volume.
sudo lvextend -L+80g /dev/mapper/vg0-kubelet sudo resize2fs /dev/mapper/vg0-kubelet
We have enough space (with ml-serve1001 already done of course):
elukey@cumin1001:~$ sudo cumin 'ml-serve[1,2]*' 'pvs' 16 hosts will be targeted: ml-serve[2001-2008].codfw.wmnet,ml-serve[1001-1008].eqiad.wmnet OK to proceed on 16 hosts? Enter the number of affected hosts to confirm or "q" to quit: 16 ===== NODE GROUP ===== (1) ml-serve1001.eqiad.wmnet ----- OUTPUT of 'pvs' ----- PV VG Fmt Attr PSize PFree /dev/md0 vg0 lvm2 a-- 446.72g 9.35g ===== NODE GROUP ===== (4) ml-serve[1005-1008].eqiad.wmnet ----- OUTPUT of 'pvs' ----- PV VG Fmt Attr PSize PFree /dev/md0 vg0 lvm2 a-- 446.21g 89.25g ===== NODE GROUP ===== (11) ml-serve[2001-2008].codfw.wmnet,ml-serve[1002-1004].eqiad.wmnet ----- OUTPUT of 'pvs' ----- PV VG Fmt Attr PSize PFree /dev/md0 vg0 lvm2 a-- 446.72g 89.35g