Page MenuHomePhabricator

Low root disk space on multiple eqsin cp nodes
Closed, ResolvedPublic

Description

We are running low of disk space on the root filesystem of multiple cache nodes in eqsin:

cp5009.eqsin.wmnet: /dev/md0       ext4  9.1G  7.5G  1.2G  87% /
cp5008.eqsin.wmnet: /dev/md0       ext4  9.1G  7.6G  1.1G  88% /
cp5011.eqsin.wmnet: /dev/md0       ext4  9.1G  7.8G  926M  90% /

A significant amount of such space is taken by logs (and the apt cache):

root@cp5008:/var# du -sh * | grep G
1.7G    cache
1.7G    log

The bulk of those logs is produced by trafficserver-tls, and is stored on disk under both /var/log/messages as well as /var/log/user.log by rsyslog.

root@cp5008:/var/log# awk '{print $5}' /var/log/user.log | sort | uniq -c | sort -n | tail -3
   1285 puppet-agent-cronjob:
  28652 prometheus-trafficserver-exporter[30565]:
1488736 trafficserver-tls[18826]:

Considering that trafficserver-tls logs are handled by the systemd journal already, we should not persist them to disk too (and surely not duplicate them to multiple files).

Event Timeline

ema triaged this task as High priority.Sep 3 2021, 8:44 AM

Mentioned in SAL (#wikimedia-operations) [2021-09-03T08:45:45Z] <ema> cp-eqsin: clean apt cache to free up some space T290305

Change 717311 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] rsyslog: stop saving trafficserver-tls logs to disk

https://gerrit.wikimedia.org/r/717311

Change 719052 had a related patch set uploaded (by Ema; author: Ema):

[operations/puppet@production] rsyslog: stop saving trafficserver logs to disk

https://gerrit.wikimedia.org/r/719052

Change 719052 merged by Ema:

[operations/puppet@production] rsyslog: stop saving trafficserver logs to disk

https://gerrit.wikimedia.org/r/719052

ema claimed this task.

trafficserver-tls is not writing to local syslog anymore, and all eqsin hosts have at least ~ 30% available disk space now. Closing.