Page MenuHomePhabricator

log rotation stopped on mwlog for all files but "api.log"
Closed, ResolvedPublic

Description

It looks like the change in parent task has caused all but api.log rotation to stop:

root@mwlog1002:/srv/mw-log# ls -latr /srv/mw-log/archive/ | grep -v api.log | tail -10
-rw-r--r-- 1 udp2log udp2log       224309 May 16 04:39 XMP.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log  10919360419 May 16 04:41 xff.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log          191 May 16 07:50 LocalFile.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log         2006 May 16 10:31 thumbnail.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log         1301 May 16 10:31 headers-sent.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log         5477 May 16 12:41 LockManager.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log          609 May 16 16:39 TranslationNotifications.Jobs.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log          359 May 16 17:06 ParserCache.log-20230516.gz
drwxr-xr-x 2 udp2log udp2log       593920 Jun  5 09:05 .
drwxr-xr-x 3 udp2log udp2log         4096 Jun  5 09:05 ..

Leading to very big files, e.g. xff.log is about ~2.4T

root@mwlog1002:/srv/mw-log# ls -laSr | tail -10
-rw-r--r-- 1 udp2log udp2log  118052616945 Jun  5 09:52 FileOperation.log
-rw-r--r-- 1 udp2log udp2log  123548709216 Jun  5 09:52 testwiki.log
-rw-r--r-- 1 udp2log udp2log  132459371387 Jun  5 09:46 error.log
-rw-r--r-- 1 udp2log udp2log  148220086204 Jun  5 09:52 api-feature-usage.log
-rw-r--r-- 1 udp2log udp2log  158126324457 Jun  5 09:52 rdbms.log
-rw-r--r-- 1 udp2log udp2log  394105838955 Jun  5 09:52 DuplicateParse.log
-rw-r--r-- 1 udp2log udp2log  497621878370 Jun  5 09:52 CirrusSearch.log
-rw-r--r-- 1 udp2log udp2log  583003967243 Jun  5 09:52 EventBus.log
-rw-r--r-- 1 udp2log udp2log 2575421291295 Jun  5 09:52 xff.log
-rw-r--r-- 1 udp2log udp2log 3314721579367 Jun  5 09:52 JobExecutor.log

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2023-06-05T10:06:50Z] <godog> truncate xff.log and JobExecutor.log on mwlog1002 to reclaim space - T338127

Mentioned in SAL (#wikimedia-operations) [2023-06-05T10:06:50Z] <godog> truncate xff.log and JobExecutor.log on mwlog1002 to reclaim space - T338127

To perform this I truncated the beginning of the files via fallocate (and keep say the last 100G)

fallocate --collapse-range --offset 0 --length 2.2TB xff.log
fallocate --collapse-range --offset 0 --length 2.8TB JobExecutor.log

Meaning the disk space problem on mwlog1002 has been mitigated

Also note that mwlog2002 does not regularly rotate non-api.log logs, yet doesn't suffer from "big log files" issue, suggesting the mirroring/teeing of udp2log traffic to it isn't working as expected

root@mwlog2002:~# ls -latr /srv/mw-log/archive/ | grep -v api.log | tail -10
-rw-r--r-- 1 udp2log udp2log      890613 May 16 00:24 TitleBlacklist-cache.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log    40962762 May 16 00:24 resourceloader.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log      918100 May 16 00:25 security.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log     1147710 May 16 00:25 WikibaseQualityConstraints.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log     3059099 May 16 00:25 xff.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log         830 May 16 05:30 Bug58676.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log         189 May 16 08:29 WMDE.log-20230516.gz
-rw-r--r-- 1 udp2log udp2log         212 May 16 13:13 php7.2-fpm.log-20230516.gz
drwxr-xr-x 2 udp2log udp2log      589824 Jun  5 12:01 .
drwxr-xr-x 3 udp2log udp2log        4096 Jun  5 12:01 ..
root@mwlog2002:/srv/mw-log# ls -laSr | tail -10
-rw-r--r-- 1 udp2log udp2log  1857160875 Jun  5 12:03 CentralAuth.log
-rw-r--r-- 1 udp2log udp2log  2370439121 Jun  5 12:04 cache-cookies.log
-rw-r--r-- 1 udp2log udp2log  4123714630 Jun  5 12:04 resourceloader.log
-rw-r--r-- 1 udp2log udp2log  4452551437 Jun  5 12:04 FileOperation.log
-rw-r--r-- 1 udp2log udp2log  5134481410 Jun  5 12:04 diff.log
-rw-r--r-- 1 udp2log udp2log  9640606904 Jun  5 12:04 DuplicateParse.log
-rw-r--r-- 1 udp2log udp2log 17551329197 Jun  5 12:04 testwiki.log
-rw-r--r-- 1 udp2log udp2log 18653399028 Jun  5 12:04 EventBus.log
-rw-r--r-- 1 udp2log udp2log 19507788536 Jun  5 12:04 rdbms.log
-rw-r--r-- 1 udp2log udp2log 73699806466 Jun  5 12:04 api-feature-usage.log

looks like logrotate is treating the glob meant to exclude api.log as a literal filename 🤦‍♂️

rotating pattern: /srv/mw-log/!(api).log  after 1 days (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/!(api).log
  log /srv/mw-log/!(api).log does not exist -- skipping

Looking for alternate approaches, I ran a dry run with overlapping rules (/srv/mw-log/*.log daily, /srv/mw-log/api.log hourly) here's the output:

rotating pattern: /srv/mw-log/*.log  after 1 days (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
...
considering log /srv/mw-log/api.log
  Now: 2023-06-05 13:29
  Last rotated at 2023-06-05 13:00
  log does not need rotating (log has already been rotated)

rotating pattern: /srv/mw-log/api.log  hourly (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/api.log
  Now: 2023-06-05 13:32
  Last rotated at 2023-06-05 13:00
  log does not need rotating (log has already been rotated)

Afaict running a config like this would be ok. In theory since api.log is being rotated hourly it should never become old enough to hit the overlapping daily rule.

Change 927187 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] mwlog: fix mw-log logrotate glob

https://gerrit.wikimedia.org/r/927187

Change 927187 merged by Herron:

[operations/puppet@production] mwlog: fix mw-log logrotate glob

https://gerrit.wikimedia.org/r/927187

Checking back on this after rotations overnight, looks like we're not out of the woods yet. Daily rotations are happening once again, however api.log is now being rotated daily as well. Looks like we're hitting https://github.com/logrotate/logrotate/issues/38

According to this, an ignoreduplicates option was introduced in logrotate-3.21.0, and this version is available in Debian testing. I'll aim to try this approach next

3.21.0 looks promising. Here are my notes from testing on a throwaway VM. Depending on the version, and configuration order the behavior varies.

  1. Logrotate 3.18.0: Wildcard defined first, specific file below - the wildcard takes precedence, and the more specific file is ignored (no log found)
rotating pattern: /srv/mw-log/*.log  after 1 days (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/AbuseFilter.log
  Now: 2023-06-06 16:28
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)
considering log /srv/mw-log/antispoof.log
  Now: 2023-06-06 16:28
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)
considering log /srv/mw-log/api-feature-usage.log
  Now: 2023-06-06 16:28
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)
considering log /srv/mw-log/api.log
  Now: 2023-06-06 16:28
  Last rotated at 2023-06-06 14:00
  log does not need rotating (log has been rotated at 2023-06-06 14:00, which is less than a day ago)
considering log /srv/mw-log/api-warning.log
  Now: 2023-06-06 16:28
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)
considering log /srv/mw-log/authentication.log
  Now: 2023-06-06 16:28
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)

(snip, continues through all logs matching /srv/mw-log/*.log)

rotating pattern: /srv/mw-log/api.log  hourly (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
No logs found. Rotation not needed.
  1. Logrotate 3.18.0: Specific file first, wildcard below - specific file is honored, but the wildcard stops processing where they overlap, later files are ignored
rotating pattern: /srv/mw-log/api.log  hourly (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/api.log
  Now: 2023-06-06 16:34
  Last rotated at 2023-06-06 14:00
  log needs rotating

rotating pattern: /srv/mw-log/*.log  after 1 days (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/AbuseFilter.log
  Now: 2023-06-06 16:34
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)
considering log /srv/mw-log/antispoof.log
  Now: 2023-06-06 16:34
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)
considering log /srv/mw-log/api-feature-usage.log
  Now: 2023-06-06 16:34
  Last rotated at 2023-06-06 00:00
  log does not need rotating (log has been rotated at 2023-06-06 00:00, which is less than a day ago)

(api.log overlaps here and processing stops)
  1. Logrotate 3.21.0: Wildcard first, specific file second, ignoreduplicates - Wildcard takes precedence, specific file is ignored
considering log /srv/mw-log/antispoof.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/api-feature-usage.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/api-warning.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/api.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/authentication.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/authevents.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/badpass-priv.log
Creating new state
  Now: 2023-06-06 16:41
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/badpass.log

rotating pattern: /srv/mw-log/api.log  hourly (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
No logs found. Rotation not needed.
  1. Logrotate 3.21.0: Specific file first, wildcard second, ignoreduplicates - Specific file appears only in the hourly section, and is omitted from the wildcard
rotating pattern: /srv/mw-log/api.log  hourly (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/api.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)

rotating pattern: /srv/mw-log/*.log  after 1 days (1000 rotations)
olddir is /srv/mw-log/archive, empty log files are not rotated, old logs are removed
considering log /srv/mw-log/antispoof.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/api-feature-usage.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/api-warning.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/authentication.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/authevents.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/badpass-priv.log
Creating new state
  Now: 2023-06-06 16:38
  Last rotated at 2023-06-06 16:00
  log does not need rotating (log has already been rotated)
considering log /srv/mw-log/badpass.log
(snip, continues matching wildcard)

With this I think config 4 (upgrading the mwlog hosts to logrotate 3.21.0 and reversing the config order from more specific to least) is worth trying next.

I've already backported the package for bullseye (used it for the tests above). I'll propose a patch to pin/integrate logrotate 3.21.0 on the mwlog hosts before uploading the package itself.

Change 927745 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] aptrepo: add logrotate bullseye component

https://gerrit.wikimedia.org/r/927745

Change 927746 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] mwlog: upgrade logrotate and use ignoreduplicates

https://gerrit.wikimedia.org/r/927746

Change 927745 merged by Herron:

[operations/puppet@production] aptrepo: add logrotate bullseye component

https://gerrit.wikimedia.org/r/927745

Change 927746 merged by Herron:

[operations/puppet@production] mwlog: upgrade logrotate and use ignoreduplicates

https://gerrit.wikimedia.org/r/927746

Also note that mwlog2002 does not regularly rotate non-api.log logs, yet doesn't suffer from "big log files" issue, suggesting the mirroring/teeing of udp2log traffic to it isn't working as expected

Moving on to this issue, I think the problem is ipv6 compatibility in udp2log. Now that the hosts have ipv6 records in dns, rsyslog is resolving the "udp_tee" backends to either 127.0.0.1 (for the local udp2log instance) or the ipv6 address of the opposite mwlog host. Udp2log only binds ipv4, so the tee to the remote host isn't working. I'll upload a patch to add an ipv6 relay.

Change 928122 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] udp2log: add 6to4 relay

https://gerrit.wikimedia.org/r/928122

Change 928122 merged by Herron:

[operations/puppet@production] udp2log: add 6to4 relay

https://gerrit.wikimedia.org/r/928122

Checking in this morning, I see both /srv/mw-log/api.log hourly and /srv/mw-log/*.log daily rotations happened. And with the 6to4 relay in place I'm seeing logs arrive on both sides of the "udp tee". I think we're good here!