Page MenuHomePhabricator

"no space left on device" for codesearch9 root disk (out of inodes due to /var/log/account/pacct)
Closed, ResolvedPublic

Description

I'm seeing A LOT of files like:

./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.1.2.gz.1.1.2.gz.1.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.1.2.gz.2.gz.1.1.2.gz.2.gz
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.1.1.1.2.gz.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.1.2.gz.2.gz
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.2.gz.1.1.1.2.gz.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.2.gz.1.1.2.gz.2.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.2.gz.1.2.gz.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.2.gz.1.2.gz.1.2.gz.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.1.2.gz.2.gz.1.2.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.1.2.gz.2.gz.1.2.gz.2.gz.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.2.gz.1.1.1.2.gz.2.gz
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.2.gz.1.2.gz.2.gz.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.2.gz.1.2.gz.2.gz.1.1.1.2.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.2.gz.2.gz.1.2.gz.1.1.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.1.2.gz.2.gz.1.2.gz.1.4.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.1.1.1.2.gz.1.1.1.4.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.1.1.1.2.gz.1.1.2.gz.2.gz
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.1.2.gz.1.1.2.gz.1.1.2.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.1.2.gz.1.2.gz.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.1.1.1.1.2.gz.1.2.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.1.1.1.2.gz.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.1.1.2.gz.1.1.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.1.1.2.gz.1.4.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.1.2.gz.1.2.gz.3.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.2.gz.1.2.gz.1.3.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.1.2.gz.2.gz.2.gz.2.gz.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.2.gz.1.1.1.1.1.1.2.gz.1.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.2.gz.1.1.1.1.1.2.gz.2.gz.1.2.gz
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.2.gz.1.1.1.1.1.2.gz.2.gz.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.2.gz.1.1.1.1.2.gz.1.2.gz.1.1.1.1
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.2.gz.1.1.1.2.gz.1.2.gz.1.2.gz.2.gz
./log/account/pacct.0.4.gz.2.gz.2.gz.1.2.gz.1.2.gz.1.1.1.2.gz.2.gz.1.2.gz

Around 1M of them. I had to delete them to unlock the system

Details

Event Timeline

We already added logrotate for this, but since that is not enough, just turned it off:

sudo /usr/sbin/accton off
Turning off process accounting.

weird file names though. maybe the logrotate config had a problem? I think we can just keep it off though.

cat /etc/logrotate.d/pacct 
# SPDX-License-Identifier: Apache-2.0
/var/log/account/* {
    daily
    missingok
    compress
    delaycompress
    notifempty
    copytruncate
}

Mentioned in SAL (#wikimedia-cloud) [2026-02-13T21:39:05Z] <Krinkle> Manually pruning codesearch9:/var/log/account/pacct, ref T417397 T413739

Krinkle renamed this task from codesearch vm running out of inode again to "no space left on device" for codesearch9 root disk (out of inodes due to /var/log/account/pacct).EditedFri, Feb 13, 10:11 PM
Krinkle subscribed.

Ah, the curse of inodes again. This time on the root drive instead of the /srv volume.

Grafana dashboard:

Screenshot 2026-02-13 at 21.33.39.png (1×1 px, 145 KB)

krinkle@codesearch9:/var/log$ sudo find /var/log/account/ -type f -name "pacct*" -delete

[…] It finished after 20 minutes.

Screenshot 2026-02-13 at 22.06.19.png (1×1 px, 150 KB)

$ df -i  
Filesystem       Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1       1302528  120553 1181975   10% /

/dev/sdb       10485760 5526733 4959027   53% /srv

The backends are up again at https://codesearch.wmcloud.org/_health/ […]

I'm leaving this task open because […], I expect this to happen again. […]

weird file names though. maybe the logrotate config had a problem? I think we can just keep it off though.

Yeah, it was the same issue again. Here's a sample of some files I deleted with rm -fv instead of find -delete.

$ sudo rm -rvf account/pacct.0.4.gz.2.gz.2* 
removed 'account/pacct.0.4.gz.2.gz.2.gz'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz.1.1'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.gz'
removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.gz'

removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.2.gz.2.gz.2.gz.2.gz.1.1'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.2.gz.2.gz.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.2.gz.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.2.gz.2.gz.4.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.2.gz.4.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.2.gz.4.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.2.gz.4.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.2.gz.4.gz'
removed 'account/pacct.0.4.gz.2.gz.1.3.gz'
removed 'account/pacct.0.4.gz.2.gz.1.4.gz'
removed 'account/pacct.0.4.gz.2.gz.3.gz'
removed 'account/pacct.0.4.gz.2.gz.4.gz'
$ sudo rm -rvf account/pacct.0.4.gz*
removed 'account/pacct.0.4.gz'
removed 'account/pacct.0.4.gz.1'
removed 'account/pacct.0.4.gz.3.gz'
removed 'account/pacct.0.4.gz.4.gz'
$ sudo rm -rvf account/pacct.0.*
-bash: /usr/bin/sudo: Argument list too long
$ sudo rm -rvf account/pacct.0.3.gz.2.*
-bash: /usr/bin/sudo: Argument list too long

I think Daniel shut down pacct, it gets re-enabled again. Probably a puppet patch is needed.

Change #1239441 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] codesearch: fix recursive logrotate config

https://gerrit.wikimedia.org/r/1239441

Change #1239441 merged by Dzahn:

[operations/puppet@production] codesearch: fix recursive logrotate config

https://gerrit.wikimedia.org/r/1239441

removed 'account/pacct.0.4.gz.2.gz.2.gz.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1'

I think we see these weird file names because of a bug in the logrotate config. It uses a wildcard /var/log/account/* which makes it rotate already rotated logs and recurses.

Attempted fix merged on prod puppetmaster.

Mentioned in SAL (#wikimedia-cloud) [2026-02-14T00:01:01Z] <mutante> codesearch9: systemctl restart logrotate with new config without * - T413739

Logrotate config changed and restarted. I hope this is fixed now. Currently /var/log/account is empty.

Krinkle assigned this task to Dzahn.