Page MenuHomePhabricator

tools bastion accounting logs super noisy, filling /var
Closed, ResolvedPublic

Description

Typical output from lastcomm:

10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03
10.64.37.10-man   F    root     __         0.00 secs Mon Jul 27 19:03

ad infinitum. There's more than 100000 entries per minute (!) which explains why /var/log/accounts/pacct fills up so quickly. There are some more sensible entries such as

sshd             S     root     __         0.01 secs Mon Jul 27 19:06

The format is supposed to be:

+ command name of the process
+ flags, as recorded by the system accounting routines:
     S -- command executed by super-user
     F -- command executed after a fork but without a following exec
     C -- command run in PDP-11 compatibility mode (VAX only)
     D -- command terminated with the generation of a core file
     X -- command was terminated with the signal SIGTERM
+ the name of the user who ran the process
+ time the process exited

Other possibly related tasks:

Event Timeline

Andrew claimed this task.
Andrew raised the priority of this task from to Needs Triage.
Andrew updated the task description. (Show Details)
Andrew added a project: Labs-Sprint-107.
Andrew subscribed.
valhallasw set Security to None.
valhallasw added a project: Toolforge.

OK, so besides lastcomm, there's also dump-acct to parse the pacct file, which *does* dump the parent pid. In this case:

command, version, user time, system time,  effective  time, uid, gid, memory, io, pid, ppid, time
root@tools-bastion-01:/home/valhallasw# dump-acct /var/log/account/pacct.0 | grep man | head
10.64.37.10-man |v3|     0.00|     0.00|     0.00|     0|     0|     0.00|     0.00|   10859        2|Sun Jul 26 06:30:23 2015
10.64.37.10-man |v3|     0.00|     0.00|     0.00|     0|     0|     0.00|     0.00|   10860        2|Sun Jul 26 06:30:23 2015
10.64.37.10-man |v3|     0.00|     0.00|     0.00|     0|     0|     0.00|     0.00|   10862        2|Sun Jul 26 06:30:23 2015
10.64.37.10-man |v3|     0.00|     0.00|     0.00|     0|     0|     0.00|     0.00|   10863        2|Sun Jul 26 06:30:23 2015
10.64.37.10-man |v3|     0.00|     0.00|     0.00|     0|     0|     0.00|     0.00|   10864        2|Sun Jul 26 06:30:23 2015
10.64.37.10-man |v3|     0.00|     0.00|     0.00|     0|     0|     0.00|     0.00|   10865        2|Sun Jul 26 06:30:23 2015

so the parent pid is 2, which is

root@tools-bastion-01:/home/valhallasw# ps -l 2
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY        TIME CMD
1 S     0     2     0  1  80   0 -     0 kthrea ?        681:52 [kthreadd]

which supports the idea this is related to nfs mounts.

The issue has been going on for a while: /var/log/account/pacct.1 contains 24 hours of messages without interruptions.

tools-bastion-01's IP address is 10.68.17.228, the IP in the first column refers to:

root@tools-bastion-01:~# host 10.64.37.10
10.37.64.10.in-addr.arpa domain name pointer labstore.svc.eqiad.wmnet.
root@tools-bastion-01:~#

Is that a login (attempt) from that host or some NFS client application on tools-bastion-01?

It's supposed to be a command that runs on tools-bastion-01, called 10.64.37.10-man.

My hunch is that these are actual processes (their PIDs are 10859, 10860, etc) spawned by NFS somehow, but I'm not sure how to confirm that. Searching for '-man' in the kernel source did not yield any hits.

The references to kthread in the nfs module are the following:

neither of those resemble the format we see.

As for the linked stackexchange bug: their name is [123.45.78.901-ma], which suggests to me the 'ma' and 'man' are cropped process names -- their IP address looks fabricated, so the original one might have been one character longer than ours.

valhallasw@tools-bastion-01:~/linux/linux-master/fs$ man<tab><tab>
man                        mandb                      manpage-alert
manage-nfs-volumes-daemon  manhole                    manpath

neither of those resemble the format we see.

The latter *does* resemble the format we see:

	snprintf(buf, sizeof(buf), "%s-manager",
			rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR));
	task = kthread_run(nfs4_run_state_manager, clp, "%s", buf);

and 10.64.37.10-manager *exactly* fits our description.

As for other hosts:

tools-exec-1201 has entries as well, but only roughly twice a minute:

10.64.37.10-man   F    root     __         0.00 secs Tue Jul 28 14:06
10.64.37.10-man   F    root     __         0.00 secs Tue Jul 28 14:06
10.64.37.10-man   F    root     __         0.00 secs Tue Jul 28 14:05
10.64.37.10-man   F    root     __         0.00 secs Tue Jul 28 14:05
10.64.37.10-man   F    root     __         0.00 secs Tue Jul 28 14:04

also here the process dies quickly (0.00 secs runtime), so that's probably the intended behavior. The same behavior can be seen on exec-1401.

Tools-redis-02, which is light on NFS usage (I assume, at least..) has slightly fewer starts, approx one per minute.

This is also happening on tools-webgrid-lighttpd-1401 (and some other lighttpd-14xx hosts, and maybe others as well) which is a safer host to debug on. I'll going to drain it so there's no service interruptions.

The current rate of 10.64.37.10-man entries is ~50k/minute:

valhallasw@tools-webgrid-lighttpd-1401:~$ sudo lastsudo lastcomm | head -n 1000000 | grep -e "Tue Jul 28 17:39" | wc -l
51386

The rate did not decrease after draining the server of jobs. Using

sysctl -w sunrpc.nfs_debug=1023 && sleep 2 && sysctl -w sunrpc.nfs_debug=0

I sampled nfs debug logs, but this only yielded

Jul 28 17:50:44 tools-webgrid-lighttpd-1401 kernel: [3416534.448253] --> nfs_put_client({3})
Jul 28 17:50:44 tools-webgrid-lighttpd-1401 kernel: [3416534.449060] --> nfs_put_client({3})
Jul 28 17:50:44 tools-webgrid-lighttpd-1401 kernel: [3416534.449897] --> nfs_put_client({3})
Jul 28 17:50:44 tools-webgrid-lighttpd-1401 kernel: [3416534.451842] --> nfs_put_client({3})

which doesn't immediately tell me anything.

Remounting NFS mounts with 'mount -o remount /data/project' et al doesn't stop the behavior.

tcpdump is slightly more informative:

valhallasw@tools-webgrid-lighttpd-1401:~$ sudo tcpdump port 2049 | grep ERROR | head -n 25
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
18:50:32.011125 IP labstore.svc.eqiad.wmnet.nfs > tools-webgrid-lighttpd-1401.tools.eqiad.wmflabs.790: Flags [P.], seq 96:144, ack 257, win 8222, options [nop,nop,TS val 714724407 ecr 854955476], length 48: NFS reply xid 191259162 reply ok 44 getattr ERROR: unk 10022
18:50:32.013059 IP labstore.svc.eqiad.wmnet.nfs > tools-webgrid-lighttpd-1401.tools.eqiad.wmflabs.790: Flags [P.], seq 192:240, ack 513, win 8222, options [nop,nop,TS val 714724407 ecr 854955477], length 48: NFS reply xid 224813594 reply ok 44 getattr ERROR: unk 10022
18:50:32.014432 IP labstore.svc.eqiad.wmnet.nfs > tools-webgrid-lighttpd-1401.tools.eqiad.wmflabs.790: Flags [P.], seq 288:336, ack 769, win 8222, options [nop,nop,TS val 714724407 ecr 854955477], length 48: NFS reply xid 258368026 reply ok 44 getattr ERROR: unk 10022

From wireshark, the information is slightly clearer. The log is filled with

84	0.040043	10.68.16.34	10.64.37.10	NFS	206	V4 Call (Reply In 85) RELEASE_LOCKOWNER
85	0.040343	10.64.37.10	10.68.16.34	NFS	114	V4 Reply (Call In 84) RELEASE_LOCKOWNER Status: NFS4ERR_STALE_CLIENTID

NFS4ERR_STALE_CLIENTID is error 10022.

The client then tries to renew the client ID:

86	0.040520	10.68.16.34	10.64.37.10	NFS	182	V4 Call (Reply In 87) RENEW CID: 0x4fe9
87	0.040828	10.64.37.10	10.68.16.34	NFS	114	V4 Reply (Call In 86) RENEW

and immediately after we get another RELEASE_LOCKOWNER cycle:

88	0.040906	10.68.16.34	10.64.37.10	NFS	206	V4 Call (Reply In 89) RELEASE_LOCKOWNER
89	0.041893	10.64.37.10	10.68.16.34	NFS	114	V4 Reply (Call In 88) RELEASE_LOCKOWNER Status: NFS4ERR_STALE_CLIENTID.

as the tcpdump is safe, I tried the same on tools-bastion-01, but we see a /different/ error there:

18:59:38.149578 IP labstore.svc.eqiad.wmnet.nfs > tools-bastion-01.tools.eqiad.wmflabs.828: Flags [P.], seq 2352:2400, ack 5697, win 15587, options [nop,nop,TS val 714860941 ecr 856712458], length 48: NFS reply xid 3264445160 reply ok 44 getattr ERROR: unk 10011

which is NFS4ERR_EXPIRED, but it's also in reply to RELEASE_LOCKOWNER.

unmounting and remounting nfs mount points seems to fix the issue, at least temporarily. Now I'm trying to see if it's any one particular mount that is the culprit... but I doubt it.

My inspection of the packets coupled with reading the source seem to point to the kernel holding advisory locks over files on NFS, but the NFS server having lost track of those. There were a number of restarts of the NFS service when we split the filesystems per-project during the outage aftermath that may have caused locks to be lost[1], and the busiest instances would normally have been the most affected.

I can find no way to instruct the kernel to abandon those locks short of completely unmounting the filesystem.

[1] There is a restart grace period to recover locks when one restarts the NFS server (90s) but a very loaded system may not retry during that interval if it is too loaded; I suspect this is what happened in this case.

I have done an unmount/remount on those tools nodes where it was possible. I've also rebooted tools-webgrid-lighttpd-1404 and tools-webgrid-lighttpd-1401 which were the two non-bastion instances with serious troubles.

I've scheduled a reboot of tools-bastion-01 for tomorrow. After that is done I think we can consider this more-or-less resolved.

The disk space on tools-webgrid-lighttpd-1406 was running low as well, so I rebooted that instance, too. Searching for instances that have a pacct.0 with a size of 1 GByte or more and have not been rebooted recently:

[tim@passepartout ~]$ pdsh -f 10 -g tools find /var/log/account -type f -name pacct.0 -size +1G -ls -exec uptime \\\;
tools-shadow-01: Permission denied (publickey).
pdsh@passepartout: tools-shadow-01: ssh exited with exit code 255
tools-webgrid-generic-1403: 1049019 4813648 -rw-r-----   1 root     adm      4929170560 Jul 29 06:32 /var/log/account/pacct.0
tools-webgrid-generic-1403:  02:03:32 up 40 days, 21:16,  0 users,  load average: 0.83, 0.84, 0.81
tools-webgrid-generic-1401: 1049109 4836756 -rw-r-----   1 root     adm      4952824384 Jul 29 06:32 /var/log/account/pacct.0
tools-webgrid-generic-1401:  02:03:32 up 40 days, 21:17,  0 users,  load average: 0.55, 0.82, 0.87
tools-webgrid-generic-1402: 1048810 4292652 -rw-r-----   1 root     adm      4395667520 Jul 29 06:31 /var/log/account/pacct.0
tools-webgrid-generic-1402:  02:03:33 up 40 days, 21:17,  0 users,  load average: 0.94, 0.82, 0.80
tools-webgrid-lighttpd-1401: 1048795 4954128 -rw-r-----   1 root     adm      5073018688 Jul 29 06:32 /var/log/account/pacct.0
tools-webgrid-lighttpd-1401:  02:03:37 up 40 days, 21:15,  0 users,  load average: 0.01, 0.03, 0.06
tools-webgrid-lighttpd-1404: 1049129 5181236 -rw-r-----   1 root     adm      5305575296 Jul 29 06:31 /var/log/account/pacct.0
tools-webgrid-lighttpd-1404:  02:03:38 up  5:51,  0 users,  load average: 0.02, 0.06, 0.07
tools-webgrid-lighttpd-1406: 1049127 4889596 -rw-r-----   1 root     adm      5006930432 Jul 29 06:31 /var/log/account/pacct.0
tools-webgrid-lighttpd-1406:  02:03:40 up 27 min,  0 users,  load average: 0.08, 0.08, 0.08
[tim@passepartout ~]$

So the reboot of tools-webgrid-lighttpd-1401 didn't seem to work :-), and I will reboot:

  • tools-webgrid-generic-1401,
  • tools-webgrid-generic-1402,
  • tools-webgrid-generic-1403, and
  • tools-webgrid-lighttpd-1401

one at a time.

Rebooted:

  • tools-webgrid-generic-1401,
  • tools-webgrid-generic-1402,
  • tools-webgrid-generic-1403, and
  • tools-webgrid-lighttpd-1401.

tools-bastion-01 is now rebooted as well.

valhallasw triaged this task as Medium priority.Jul 31 2015, 7:42 PM

I propose to upstream this bug, but given that it's quite a complicated issue, I think it's worth the effort to get the bug report right. I've started a quick draft at https://etherpad.wikimedia.org/p/T107052 ; please extend it with relevant information where possible.

Regarding "Remounting shares does not solve the issue", @Andrew wrote above that remounting did solve the problem.

Shouldn't it be possible to replicate the problem by:

  • acquiring a lock on a test instance,
  • blocking NFS traffic with iptables,
  • rebooting the NFS server,
  • waiting for 90 s + t, and
  • unblocking NFS traffic?

(With "NFS server" = probably another test instance, but depending on schedule the spare labstore host could be used for that.)

I looked for instances with pacct.0 > 100 MByte, but couldn't find any in Toolforge, so I assume there is no host left with symptoms.

This happened again on tools-exec-1403; probably a hangover from the NFS breakdown last sunday (T110827).

For reference, current usage:

tools-worker-1022.tools.eqiad.wmflabs: 65M -- kube-proxy running iptables-save/restore periodically 
tools-worker-1028.tools.eqiad.wmflabs: 64M
tools-worker-1027.tools.eqiad.wmflabs: 63M
tools-worker-1026.tools.eqiad.wmflabs: 63M
tools-worker-1020.tools.eqiad.wmflabs: 63M
tools-worker-1015.tools.eqiad.wmflabs: 63M
tools-worker-1012.tools.eqiad.wmflabs: 63M
tools-worker-1008.tools.eqiad.wmflabs: 63M
tools-worker-1025.tools.eqiad.wmflabs: 62M
tools-worker-1023.tools.eqiad.wmflabs: 62M
tools-worker-1021.tools.eqiad.wmflabs: 62M
tools-worker-1019.tools.eqiad.wmflabs: 62M
tools-worker-1018.tools.eqiad.wmflabs: 62M
tools-worker-1017.tools.eqiad.wmflabs: 62M
tools-worker-1016.tools.eqiad.wmflabs: 62M
tools-worker-1014.tools.eqiad.wmflabs: 62M
tools-worker-1013.tools.eqiad.wmflabs: 62M
tools-worker-1011.tools.eqiad.wmflabs: 62M
tools-worker-1010.tools.eqiad.wmflabs: 62M
tools-worker-1007.tools.eqiad.wmflabs: 62M
tools-worker-1006.tools.eqiad.wmflabs: 62M
tools-worker-1005.tools.eqiad.wmflabs: 62M
tools-worker-1004.tools.eqiad.wmflabs: 62M
tools-worker-1003.tools.eqiad.wmflabs: 62M
tools-worker-1002.tools.eqiad.wmflabs: 62M
tools-worker-1001.tools.eqiad.wmflabs: 62M
tools-bastion-03.tools.eqiad.wmflabs: 61M -- kube-proxy running iptables-save/restore periodically
tools-bastion-02.tools.eqiad.wmflabs: 60M
tools-worker-1009.tools.eqiad.wmflabs: 57M
tools-proxy-04.tools.eqiad.wmflabs: 56M -- kube-proxy running iptables-save/restore periodically
tools-proxy-03.tools.eqiad.wmflabs: 56M
tools-sgecron-01.tools.eqiad.wmflabs: 46M -- all commands from user's crontabs
tools-cron-01.tools.eqiad.wmflabs: 34M
tools-exec-1416.tools.eqiad.wmflabs: 22M
tools-exec-1406.tools.eqiad.wmflabs: 22M
tools-sgewebgrid-lighttpd-0928.tools.eqiad.wmflabs: 21M
tools-sgeexec-0911.tools.eqiad.wmflabs: 20M
tools-exec-1405.tools.eqiad.wmflabs: 19M
tools-sgewebgrid-lighttpd-0925.tools.eqiad.wmflabs: 18M
tools-sgegrid-master.tools.eqiad.wmflabs: 18M
tools-sgeexec-0929.tools.eqiad.wmflabs: 18M
tools-paws-master-01.tools.eqiad.wmflabs: 18M
tools-exec-1418.tools.eqiad.wmflabs: 18M
tools-exec-1409.tools.eqiad.wmflabs: 18M
tools-sgeexec-0922.tools.eqiad.wmflabs: 16M
tools-paws-worker-1013.tools.eqiad.wmflabs: 16M
tools-paws-worker-1005.tools.eqiad.wmflabs: 16M
tools-paws-worker-1001.tools.eqiad.wmflabs: 16M
tools-exec-1417.tools.eqiad.wmflabs: 16M
tools-exec-1401.tools.eqiad.wmflabs: 16M
tools-sgewebgrid-lighttpd-0927.tools.eqiad.wmflabs: 15M
tools-sgewebgrid-lighttpd-0907.tools.eqiad.wmflabs: 15M
tools-paws-worker-1019.tools.eqiad.wmflabs: 15M
tools-paws-worker-1017.tools.eqiad.wmflabs: 15M
tools-paws-worker-1016.tools.eqiad.wmflabs: 15M
tools-paws-worker-1010.tools.eqiad.wmflabs: 15M
tools-paws-worker-1007.tools.eqiad.wmflabs: 15M
tools-paws-worker-1006.tools.eqiad.wmflabs: 15M
tools-paws-worker-1003.tools.eqiad.wmflabs: 15M
tools-paws-worker-1002.tools.eqiad.wmflabs: 15M
tools-exec-1412.tools.eqiad.wmflabs: 15M
tools-exec-1403.tools.eqiad.wmflabs: 15M
tools-exec-1402.tools.eqiad.wmflabs: 15M
tools-sgewebgrid-generic-0903.tools.eqiad.wmflabs: 14M
tools-mail-02.tools.eqiad.wmflabs: 14M
tools-exec-1429.tools.eqiad.wmflabs: 14M
tools-exec-1413.tools.eqiad.wmflabs: 14M
tools-exec-1408.tools.eqiad.wmflabs: 14M
tools-sgeexec-0937.tools.eqiad.wmflabs: 13M
tools-exec-1411.tools.eqiad.wmflabs: 13M
tools-sgeexec-0934.tools.eqiad.wmflabs: 12M
tools-sgeexec-0919.tools.eqiad.wmflabs: 12M
tools-exec-1415.tools.eqiad.wmflabs: 12M
tools-exec-1410.tools.eqiad.wmflabs: 12M
tools-sgeexec-0932.tools.eqiad.wmflabs: 11M
tools-sgeexec-0927.tools.eqiad.wmflabs: 11M
tools-sgeexec-0910.tools.eqiad.wmflabs: 11M
tools-exec-1404.tools.eqiad.wmflabs: 11M
tools-exec-1419.tools.eqiad.wmflabs: 9.9M
tools-sgeexec-0941.tools.eqiad.wmflabs: 8.6M
tools-exec-1414.tools.eqiad.wmflabs: 8.6M
tools-exec-1407.tools.eqiad.wmflabs: 8.5M
tools-sgeexec-0906.tools.eqiad.wmflabs: 8.1M
tools-sgeexec-0928.tools.eqiad.wmflabs: 8.0M
tools-sgeexec-0924.tools.eqiad.wmflabs: 8.0M
tools-webgrid-lighttpd-1414.tools.eqiad.wmflabs: 7.8M
tools-sgewebgrid-lighttpd-0915.tools.eqiad.wmflabs: 7.5M
tools-sgeexec-0921.tools.eqiad.wmflabs: 7.5M
tools-sgeexec-0916.tools.eqiad.wmflabs: 7.5M
tools-sgebastion-07.tools.eqiad.wmflabs: 7.5M
tools-sgeexec-0920.tools.eqiad.wmflabs: 7.2M
tools-sgeexec-0936.tools.eqiad.wmflabs: 7.0M
tools-exec-1420.tools.eqiad.wmflabs: 7.0M
tools-sgeexec-0931.tools.eqiad.wmflabs: 6.9M
tools-sgeexec-0905.tools.eqiad.wmflabs: 6.9M
tools-webgrid-lighttpd-1404.tools.eqiad.wmflabs: 6.5M
tools-sgeexec-0942.tools.eqiad.wmflabs: 6.5M
tools-sgeexec-0917.tools.eqiad.wmflabs: 6.5M
tools-sgeexec-0935.tools.eqiad.wmflabs: 6.4M
tools-sgeexec-0923.tools.eqiad.wmflabs: 6.3M
tools-sgeexec-0907.tools.eqiad.wmflabs: 6.3M
tools-sgebastion-08.tools.eqiad.wmflabs: 6.3M
tools-webgrid-lighttpd-1407.tools.eqiad.wmflabs: 6.2M
tools-sgeexec-0926.tools.eqiad.wmflabs: 6.2M
tools-exec-1432.tools.eqiad.wmflabs: 6.2M
tools-exec-1431.tools.eqiad.wmflabs: 6.2M
tools-exec-1430.tools.eqiad.wmflabs: 6.2M
tools-exec-1423.tools.eqiad.wmflabs: 6.1M
tools-webgrid-lighttpd-1402.tools.eqiad.wmflabs: 6.0M
tools-webgrid-generic-1403.tools.eqiad.wmflabs: 6.0M
tools-exec-1425.tools.eqiad.wmflabs: 6.0M
tools-webgrid-lighttpd-1413.tools.eqiad.wmflabs: 5.9M
tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs: 5.9M
tools-webgrid-lighttpd-1408.tools.eqiad.wmflabs: 5.9M
tools-sgewebgrid-lighttpd-0923.tools.eqiad.wmflabs: 5.9M
tools-sgewebgrid-lighttpd-0913.tools.eqiad.wmflabs: 5.9M
tools-sgeexec-0915.tools.eqiad.wmflabs: 5.9M
tools-exec-1428.tools.eqiad.wmflabs: 5.9M
tools-exec-1427.tools.eqiad.wmflabs: 5.9M
tools-exec-1426.tools.eqiad.wmflabs: 5.9M
tools-exec-1424.tools.eqiad.wmflabs: 5.9M
tools-exec-1422.tools.eqiad.wmflabs: 5.9M
tools-exec-1421.tools.eqiad.wmflabs: 5.9M
tools-webgrid-lighttpd-1418.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1417.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1416.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1415.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1412.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1410.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1409.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1406.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1405.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1403.tools.eqiad.wmflabs: 5.8M
tools-webgrid-lighttpd-1401.tools.eqiad.wmflabs: 5.8M
tools-webgrid-generic-1404.tools.eqiad.wmflabs: 5.8M
tools-webgrid-generic-1402.tools.eqiad.wmflabs: 5.8M
tools-webgrid-generic-1401.tools.eqiad.wmflabs: 5.8M
tools-sgewebgrid-lighttpd-0921.tools.eqiad.wmflabs: 5.8M
tools-sgewebgrid-lighttpd-0903.tools.eqiad.wmflabs: 5.8M
tools-sgeexec-0933.tools.eqiad.wmflabs: 5.8M
tools-sgewebgrid-lighttpd-0922.tools.eqiad.wmflabs: 5.7M
tools-sgewebgrid-lighttpd-0916.tools.eqiad.wmflabs: 5.7M
tools-sgewebgrid-lighttpd-0912.tools.eqiad.wmflabs: 5.7M
tools-sgeexec-0939.tools.eqiad.wmflabs: 5.7M
tools-sgeexec-0930.tools.eqiad.wmflabs: 5.7M
tools-sgeexec-0925.tools.eqiad.wmflabs: 5.7M
tools-sgewebgrid-lighttpd-0926.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0924.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0920.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0917.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0914.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0911.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0910.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0909.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0908.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0906.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0905.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0904.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-lighttpd-0902.tools.eqiad.wmflabs: 5.6M
tools-sgewebgrid-generic-0902.tools.eqiad.wmflabs: 5.6M
tools-sgeexec-0940.tools.eqiad.wmflabs: 5.6M
tools-sgeexec-0912.tools.eqiad.wmflabs: 5.6M
tools-sgeexec-0901.tools.eqiad.wmflabs: 5.6M
tools-sgeexec-0918.tools.eqiad.wmflabs: 5.5M
tools-sgeexec-0914.tools.eqiad.wmflabs: 5.5M
tools-sgeexec-0913.tools.eqiad.wmflabs: 5.5M
tools-sgeexec-0909.tools.eqiad.wmflabs: 5.5M
tools-sgeexec-0908.tools.eqiad.wmflabs: 5.5M
tools-sgeexec-0904.tools.eqiad.wmflabs: 5.5M
tools-sgeexec-0938.tools.eqiad.wmflabs: 5.3M
tools-package-builder-02.tools.eqiad.wmflabs: 4.2M
tools-redis-1002.tools.eqiad.wmflabs: 3.5M
tools-redis-1001.tools.eqiad.wmflabs: 3.5M
tools-prometheus-02.tools.eqiad.wmflabs: 3.3M
tools-sge-services-04.tools.eqiad.wmflabs: 3.2M
tools-sge-services-03.tools.eqiad.wmflabs: 3.2M
tools-docker-registry-04.tools.eqiad.wmflabs: 3.2M
tools-docker-registry-03.tools.eqiad.wmflabs: 3.1M
tools-static-12.tools.eqiad.wmflabs: 3.0M
tools-grid-master.tools.eqiad.wmflabs: 3.0M
tools-clushmaster-02.tools.eqiad.wmflabs: 2.9M
tools-checker-01.tools.eqiad.wmflabs: 2.9M
tools-static-13.tools.eqiad.wmflabs: 2.8M
tools-elastic-03.tools.eqiad.wmflabs: 2.8M
tools-elastic-01.tools.eqiad.wmflabs: 2.8M
tools-docker-builder-06.tools.eqiad.wmflabs: 2.7M
tools-k8s-etcd-03.tools.eqiad.wmflabs: 2.6M
tools-k8s-etcd-02.tools.eqiad.wmflabs: 2.6M
tools-flannel-etcd-02.tools.eqiad.wmflabs: 2.6M
tools-flannel-etcd-01.tools.eqiad.wmflabs: 2.6M
tools-prometheus-01.tools.eqiad.wmflabs: 2.2M
tools-sgegrid-shadow.tools.eqiad.wmflabs: 2.1M
tools-checker-02.tools.eqiad.wmflabs: 2.1M
tools-grid-shadow.tools.eqiad.wmflabs: 2.0M
tools-k8s-master-01.tools.eqiad.wmflabs: 1.7M
tools-elastic-02.tools.eqiad.wmflabs: 1.7M
tools-k8s-etcd-01.tools.eqiad.wmflabs: 1.6M
tools-flannel-etcd-03.tools.eqiad.wmflabs: 1.6M
tools-logs-02.tools.eqiad.wmflabs: 1.5M

It seems current usage is at acceptable levels. Feel free to re-open in case this becomes an issue again (we may have to increase log rotation frequency).