Page MenuHomePhabricator

Disk space on grafana2001 is low
Closed, ResolvedPublic

Description

Hi,

There was an alert today

PROBLEM - Disk space on grafana2001 is CRITICAL: DISK CRITICAL - free space: / 220MiB (1% inode=36%): /tmp 220MiB (1% inode=36%): /var/tmp 220MiB (1% inode=36%): https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/d/000000377/host-overview?var-server=grafana2001&var-datasource=codfw+prometheus/ops

@jcrespo cleaned up the apt cache which got it back under threshold, but it's still at 96% used of its 15GB disk.

Since it's a VM, and resizing the disk is a multi-reboot process, could you look at growing its disk when you get a chance?

Thanks

Event Timeline

Checklist:

root@ganeti2032:~# gnt-instance list  | grep grafana
grafana2001.codfw.wmnet             kvm        debootstrap+default ganeti2030.codfw.wmnet running   4.0G
root@ganeti2032:~# gnt-instance info grafana2001.codfw.wmnet
  ...
  Disk template: drbd
  Disks:
    - disk/0: drbd, size 16.0G
      access mode: rw
  ...
root@cumin1002:~# cookbook sre.hosts.downtime --minutes 60 -t 'T385282' -r 'expand the root partition and fs on grafana2001' grafana2001.codfw.wmnet
root@ganeti2032:~# gnt-instance grow-disk --absolute grafana2001.codfw.wmnet 0 50g #inside a tmux
root@grafana2001:~# poweroff
root@ganeti2032:~# gnt-instance list  | grep grafana
#grafana2001.codfw.wmnet             kvm        debootstrap+default ganeti2030.codfw.wmnet ERROR_down      -
root@ganeti2032:~# gnt-instance startup grafana2001.codfw.wmnet
root@ganeti2032:~# gnt-instance list  | grep grafana
#grafana2001.codfw.wmnet             kvm        debootstrap+default ganeti2030.codfw.wmnet running   4.0G
root@grafana2001:~# swapoff -a
root@grafana2001:~# parted -- /dev/vda print
#Model: Virtio Block Device (virtblk)
#Disk /dev/vda: 53.7GB
#Sector size (logical/physical): 512B/512B
#Partition Table: msdos
#Disk Flags:
#
#Number  Start   End     Size    Type     File system     Flags
# 1      1049kB  16.2GB  16.2GB  primary  ext4            boot
# 2      16.2GB  17.2GB  1023MB  primary  linux-swap(v1)  swap
root@grafana2001:~# parted -- /dev/vda rm 2
root@grafana2001:~# parted -- /dev/vda resizepart 1 -1050M
root@grafana2001:~# parted -- /dev/vda mkpart primary swap -1049M -1 #command for msdos partition-table
root@grafana2001:~# parted -- /dev/vda print
#Model: Virtio Block Device (virtblk)
#Disk /dev/vda: 53.7GB
#Sector size (logical/physical): 512B/512B
#Partition Table: msdos
#Disk Flags:
#
#Number  Start   End     Size    Type     File system  Flags
# 1      1049kB  52.6GB  52.6GB  primary  ext4         boot
# 2      52.6GB  53.7GB  1048MB  primary               swap
root@grafana2001:~# #/etc/fstab: comment swap and check UUID for vda1, update if needed
root@grafana2001:~# cat /etc/fstab | grep $(blkid /dev/vda1 | sed -nr 's/.*UUID="(.*)" B.*/\1/p') #if a row matches the grep pattern, no update is needed
#UUID=ea3b870e-69bf-45d0-a816-c1b124384d58 /               ext4    errors=remount-ro 0       1
root@grafana2001:~# reboot
root@grafana2001:~# resize2fs /dev/vda1
root@grafana2001:~# mkswap /dev/vda2
root@grafana2001:~# blkid /dev/vda2
root@grafana2001:~# #/etc/fstab: remove comment, update UUID
root@grafana2001:~# swapon -a

Icinga downtime and Alertmanager silence (ID=f9eaad94-71c6-432b-8029-19a8b7cfc6e3) set by tappof@cumin1002 for 1:00:00 on 1 host(s) and their services with reason: expand the root partition and fs on grafana2001

grafana2001.codfw.wmnet

Change #1120476 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/puppet@production] grafana: failover

https://gerrit.wikimedia.org/r/1120476

Change #1120483 had a related patch set uploaded (by Tiziano Fogli; author: Tiziano Fogli):

[operations/dns@master] grafana: failover

https://gerrit.wikimedia.org/r/1120483

Mentioned in SAL (#wikimedia-operations) [2025-02-18T09:42:50Z] <tappof> performing grafana failover (grafana2001 is becoming the new active host) T385282

Change #1120476 merged by Tiziano Fogli:

[operations/puppet@production] grafana: failover

https://gerrit.wikimedia.org/r/1120476

Change #1120483 merged by Tiziano Fogli:

[operations/dns@master] grafana: failover

https://gerrit.wikimedia.org/r/1120483

Icinga downtime and Alertmanager silence (ID=ba5b9f08-cdf0-4c15-ac9e-da6422c76a56) set by tappof@cumin1002 for 1:30:00 on 1 host(s) and their services with reason: expand the root partition and fs on grafana1002

grafana1002.eqiad.wmnet

Icinga downtime and Alertmanager silence (ID=e6c0383e-ee8d-4382-822e-9d671fddfa9d) set by tappof@cumin1002 for 1:00:00 on 1 host(s) and their services with reason: expand the root partition and fs on grafana1002

grafana1002.eqiad.wmnet
tappof claimed this task.

Both VMs now have a 50GB root filesystem.

Mentioned in SAL (#wikimedia-operations) [2025-02-18T15:41:12Z] <tappof> performing grafana failback (grafana1002 is becoming the new active host) T385282