The tools-worker-1020.tools.eqiad.wmflabs VM is running out of disk space as reported by the grafana board https://grafana-labs.wikimedia.org/dashboard/db/tools-basic-alerts?refresh=5m&orgId=1
I jumped in via SSH and investigated where the big amount of data is:
df -h Filesystem Size Used Avail Use% Mounted on udev 10M 0 10M 0% /dev tmpfs 1.6G 178M 1.4G 12% /run /dev/vda3 19G 17G 733M 96% / [...]
The problem seems to be in /var/log/
aborrero@tools-worker-1020:/var/log$ ls -Slah | head total 14G -rw-r--r-- 1 root adm 6.4G Jan 10 12:48 syslog -rw-r----- 1 root adm 6.4G Jan 10 12:48 daemon.log -rw-r----- 1 root adm 285M Jan 10 12:48 auth.log -rw-r--r-- 1 root adm 36M Jan 10 12:40 messages -rw-r----- 1 root adm 29M Jan 10 06:25 kern.log -rw-r----- 1 root adm 14M Jul 2 2017 daemon.log.1 -rw-r----- 1 root adm 9.7M Jul 2 2017 auth.log.1 -rw-r----- 1 root adm 7.0M Jan 10 12:40 user.log -rw-rw-r-- 1 root utmp 5.1M Jan 10 12:32 lastlog
Both files syslog and daemon.log are flooded with the same:
aborrero@tools-worker-1020:/var/log$ sudo tail syslog Jan 10 12:49:12 tools-worker-1020 kubelet[30011]: E0110 12:49:12.735990 30011 kubelet_getters.go:249] Could not read directory /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory Jan 10 12:49:12 tools-worker-1020 kubelet[30011]: E0110 12:49:12.736075 30011 kubelet_volumes.go:159] Orphaned pod "bcb36fe1-7d3d-11e7-9b1a-fa163edef48a" found, but error open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory occured during reading volume dir from disk Jan 10 12:49:14 tools-worker-1020 kubelet[30011]: E0110 12:49:14.730062 30011 kubelet_getters.go:249] Could not read directory /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory Jan 10 12:49:14 tools-worker-1020 kubelet[30011]: E0110 12:49:14.730103 30011 kubelet_volumes.go:159] Orphaned pod "bcb36fe1-7d3d-11e7-9b1a-fa163edef48a" found, but error open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory occured during reading volume dir from disk Jan 10 12:49:16 tools-worker-1020 kubelet[30011]: E0110 12:49:16.739702 30011 kubelet_getters.go:249] Could not read directory /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory Jan 10 12:49:16 tools-worker-1020 kubelet[30011]: E0110 12:49:16.739743 30011 kubelet_volumes.go:159] Orphaned pod "bcb36fe1-7d3d-11e7-9b1a-fa163edef48a" found, but error open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory occured during reading volume dir from disk Jan 10 12:49:18 tools-worker-1020 kubelet[30011]: E0110 12:49:18.731541 30011 kubelet_getters.go:249] Could not read directory /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory Jan 10 12:49:18 tools-worker-1020 kubelet[30011]: E0110 12:49:18.731573 30011 kubelet_volumes.go:159] Orphaned pod "bcb36fe1-7d3d-11e7-9b1a-fa163edef48a" found, but error open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory occured during reading volume dir from disk Jan 10 12:49:20 tools-worker-1020 kubelet[30011]: E0110 12:49:20.742577 30011 kubelet_getters.go:249] Could not read directory /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory Jan 10 12:49:20 tools-worker-1020 kubelet[30011]: E0110 12:49:20.742617 30011 kubelet_volumes.go:159] Orphaned pod "bcb36fe1-7d3d-11e7-9b1a-fa163edef48a" found, but error open /var/lib/kubelet/pods/bcb36fe1-7d3d-11e7-9b1a-fa163edef48a/volumes: no such file or directory occured during reading volume dir from disk
I checked for trends in the disk usage in graphite and it seems this is not a new issue (https://graphite-labs.wikimedia.org/graphlot/?width=586&height=308&_salt=1515589449.215&from=00%3A00_20170801&until=23%3A59_20180110&target=tools.tools-worker-1020.diskspace.root.byte_percentfree)
Other worker hosts doesn't show this behavior.
Probably an issue with kubelet in this host.