MX: increasing disk space
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	herron
	Apr 6 2022, 4:47 PM

Description

Presently MXes have a 20G root filesystem only which is normally about half utilized.

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        19G   11G  7.2G  60% /

It was seen today that a significant spike in deferred mail caused the exim mainlog to become quite large (5+ GB) and nearly fill the disk.

Let's consider how best to add some space to these hosts, and also think about how to prevent large logs from filling the root filesystem which would interrupt mail flow.

Event Timeline

herron created this task.Apr 6 2022, 4:47 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 6 2022, 4:47 PM

herron renamed this task from MX: increasing disks space to MX: increasing disk space.Apr 6 2022, 4:47 PM

RhinosF1 subscribed.Apr 6 2022, 5:04 PM

We could add a second disk to the mx* VMs and move /var or to it, but this sounds rather something to factor it for the new VMs running the new setup? (The immediate log consumption from today's event can probably just be mitigated with a manual logrotate run?)

I rotated the log file and then compressed it on another host for this specific incident, but it was cumbersome. I think we should definitely embiggen the disks for the new Postfix based hosts. I am less sure if it is worth the effort for these hosts.

We've just had a repeat, and again exim mainlog is I think too big for logrotate to succeed with :-/

mvernon@mx1001:~$ ls -lsh /var/log/exim4/mainlog
3.9G -rw-r----- 1 Debian-exim adm 3.9G Apr  8 12:56 /var/log/exim4/mainlog
mvernon@mx1001:~$ df -lh /var/log/exim4/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        19G   15G  3.0G  84% /

ok, thanks, I'll rotate it manually and plan on embiggening the existing
hosts.

jhathaway claimed this task.Apr 21 2022, 2:16 PM

jhathaway triaged this task as Medium priority.

Mentioned in SAL (#wikimedia-operations) [2022-11-18T18:21:40Z] <herron> removed older exim logs to free space T305567

Mentioned in SAL (#wikimedia-operations) [2022-12-07T23:24:22Z] <mutante> mx1001 about to run out of disk again - apt-get clean, gzip /var/log/exim4/mainlog.1 find -mtime +31 -delete in /var/log/exim4 - deleting old logs to prevent mail server running out of disk - it was alerting in Icinga but same as conf* - monitoring works, alerting does not T305567

I think the priority is surprisingly low for this being the main prod mail server and almost running out of disk multiple times.

@Dzahn I agree, sorry for letting this one slip out of my todo list, I'll take care of it tomorrow

Ladsgroup subscribed.Dec 8 2022, 3:23 AM

bumped both MXes, mx{1001,2001}.wikimedia.org to 50G root partitions

herron awarded a token.Dec 9 2022, 8:00 PM

Dzahn awarded a token.Dec 12 2022, 3:53 PM

MX: increasing disk spaceClosed, ResolvedPublicActions

Description

Event Timeline

MX: increasing disk space
Closed, ResolvedPublic
Actions