Page MenuHomePhabricator

pybal stops logging
Closed, ResolvedPublic

Description

It seems that pybal stops logging after a while. For example, at the time of this writing:

lvs1007.eqiad.wmnet: -rw-r--r-- 1 root root 0 Mar  8 06:25 /var/log/pybal.log
lvs1008.eqiad.wmnet: -rw-r--r-- 1 root root 0 Mar  8 06:25 /var/log/pybal.log
lvs1011.eqiad.wmnet: -rw-r--r-- 1 root root 0 Mar  8 06:25 /var/log/pybal.log
lvs1010.eqiad.wmnet: -rw-r--r-- 1 root root 0 Mar  8 06:25 /var/log/pybal.log
lvs1004.wikimedia.org: -rw-r----- 1 root adm 0 Mar 10 06:25 /var/log/pybal.log
lvs1005.wikimedia.org: -rw-r----- 1 root adm 0 Mar 13 06:25 /var/log/pybal.log
lvs1002.wikimedia.org: -rw-r----- 1 root adm 0 Mar  8 06:25 /var/log/pybal.log
lvs1001.wikimedia.org: -rw-r----- 1 root adm 0 Mar 10 06:25 /var/log/pybal.log
lvs1003.wikimedia.org: -rw-r----- 1 root adm 1366881 Mar 14 11:33 /var/log/pybal.log
lvs1009.eqiad.wmnet: -rw-r--r-- 1 root root 257997 Mar 14 11:30 /var/log/pybal.log
lvs1012.eqiad.wmnet: -rw-r--r-- 1 root root 272986 Mar 14 11:33 /var/log/pybal.log
lvs1006.wikimedia.org: -rw-r----- 1 root adm 1380554 Mar 14 11:33 /var/log/pybal.log

While trying to figure out what's happening, I've found that pybal's logrotate script is using reload rsyslog instead of service rsyslog rotate:

postrotate
    reload rsyslog >/dev/null 2>&1 || true
endscript

Although that's certainly wrong, it doesn't seem to be the cause of pybal's silence, which needs to be investigated further.

Event Timeline

ema triaged this task as Medium priority.Mar 14 2017, 11:47 AM

Change 342608 had a related patch set uploaded (by Ema):
[operations/puppet] pybal: use 'rsyslog rotate' in logrotate configuration

https://gerrit.wikimedia.org/r/342608

On machines where pybal doesn't log anything, the pybal process is indeed not doing any meaningful writes to standard out/err:

$ sudo strace -e trace=write -e write=1,2 -p50964 2>&1 | ts
Mar 14 15:38:45 Process 50964 attached
Mar 14 15:39:37 write(11, "x", 1)                       = 1
Mar 14 15:39:42 write(11, "x", 1)                       = 1

Change 342608 merged by Ema:
[operations/puppet] logrotate: use 'rsyslog rotate' instead of 'reload rsyslog'

https://gerrit.wikimedia.org/r/342608

ema claimed this task.

In the past we've always had a bunch of hosts constantly depooled because of various issues (eg: T148891) thus we got used to seeing frequent logs in pybal.log (host down but depooled, etc).

As it turns out, pybal hasn't logged anything for a while because it had nothing to report. I've tried briefly depooling a varnish-fe and the event was logged properly.