Page MenuHomePhabricator

MX: increasing disk space
Closed, ResolvedPublic

Description

Presently MXes have a 20G root filesystem only which is normally about half utilized.

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        19G   11G  7.2G  60% /

It was seen today that a significant spike in deferred mail caused the exim mainlog to become quite large (5+ GB) and nearly fill the disk.

Let's consider how best to add some space to these hosts, and also think about how to prevent large logs from filling the root filesystem which would interrupt mail flow.

Event Timeline

herron renamed this task from MX: increasing disks space to MX: increasing disk space.Apr 6 2022, 4:47 PM

We could add a second disk to the mx* VMs and move /var or to it, but this sounds rather something to factor it for the new VMs running the new setup? (The immediate log consumption from today's event can probably just be mitigated with a manual logrotate run?)

I rotated the log file and then compressed it on another host for this specific incident, but it was cumbersome. I think we should definitely embiggen the disks for the new Postfix based hosts. I am less sure if it is worth the effort for these hosts.

We've just had a repeat, and again exim mainlog is I think too big for logrotate to succeed with :-/

mvernon@mx1001:~$ ls -lsh /var/log/exim4/mainlog
3.9G -rw-r----- 1 Debian-exim adm 3.9G Apr  8 12:56 /var/log/exim4/mainlog
mvernon@mx1001:~$ df -lh /var/log/exim4/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        19G   15G  3.0G  84% /

ok, thanks, I'll rotate it manually and plan on embiggening the existing
hosts.

jhathaway triaged this task as Medium priority.

Mentioned in SAL (#wikimedia-operations) [2022-11-18T18:21:40Z] <herron> removed older exim logs to free space T305567

Mentioned in SAL (#wikimedia-operations) [2022-12-07T23:24:22Z] <mutante> mx1001 about to run out of disk again - apt-get clean, gzip /var/log/exim4/mainlog.1 find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not T305567

I think the priority is surprisingly low for this being the main prod mail server and almost running out of disk multiple times.

jhathaway raised the priority of this task from Medium to High.Dec 8 2022, 2:55 AM

@Dzahn I agree, sorry for letting this one slip out of my todo list, I'll take care of it tomorrow

bumped both MXes, mx{1001,2001}.wikimedia.org to 50G root partitions