The current set up of LVM volumes on the namenodes is not optimal:
elukey@an-master1001:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 46G 31G 13G 72% / /dev/mapper/an--master1001--vg-lvol0 173G 8.0G 165G 5% /var/lib/hadoop/name elukey@an-master1001:~$ sudo lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lvol0 an-master1001-vg -wi-ao---- 175.95g elukey@an-master1001:~$ sudo pvs PV VG Fmt Attr PSize PFree /dev/md2 an-master1001-vg lvm2 a-- 175.95g 0
elukey@an-master1002:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 46G 19G 25G 44% / /dev/mapper/an--master1002--vg-backup 138G 119G 20G 87% /srv /dev/mapper/an--master1002--vg-namenode 35G 8.1G 27G 24% /var/lib/hadoop/name elukey@an-master1002:~$ sudo lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert backup an-master1002-vg -wi-ao---- 140.75g namenode an-master1002-vg -wi-ao---- 35.19g elukey@an-master1002:~$ sudo pvs PV VG Fmt Attr PSize PFree /dev/md2 an-master1002-vg lvm2 a-- 175.95g 0
On an-master1001 we don't really use the LVM volume, and on an-master1002 we still use the vg-backup that shouldn't be needed anymore (it is a lvm volume snapshot from an-coord1001, it was created before we had the mysql replication to db1108). Moreover, we store /var/log/hadoop-hdfs/* logs on the root partition, that is not great since it is tiny.
What we should do is:
- resize/remove lvm volumes/partitions that are not needed.
- think about having /var/log/hadoop-hdfs on a lvm volume, and increase the logging retention (hdfs-audit.log, hdfs-namenode.log, etc..)
- verify the partman config of these nodes and how it will change with the buster migration (the SRE team have standardized a lot of partman recipes, the default is now to have data under /srv).