Page MenuHomePhabricator

wikidev people cant read /var/log/mediawiki/jobrunner.log
Closed, ResolvedPublic

Description

On mw1299.eqiad.wmnet at least, as a member of the wikidev group I am unable to access /var/log/mediawiki/jobrunner.log which is only available to root:adm:

$ ls -l /var/log/mediawiki/jobrunner.log
-rw-r----- 1 root adm 32241612 Sep 19 15:27 /var/log/mediawiki/jobrunner.log

The jobrunner service runs as www-data and /var/log/mediawiki is drwxr-xr-x 2 www-data wikidev.

Puppet has:

modules/mediawiki/manifests/init.pp
# /var/log/mediawiki contains log files for the MediaWiki jobrunner
# and for various periodic jobs that are managed by cron.
file { '/var/log/mediawiki':
    ensure => directory,
    owner  => $::mediawiki::users::web,
    group  => 'wikidev',
    mode   => '0644',
}

The hosts have been switched from Trusty with upstart to Jessie with I assume systemd. In upstart we used to have the upstart service to pass to start-stop-daemon --chuid which came from $::mediawiki::users::web.

The systemd template uses the same puppet variable and on mw1299.eqiad.wmnet:

/lib/systemd/system/jobrunner.service
[Unit]
Description="Mediawiki job queue runner loop"
After=hhvm.service

[Service]
EnvironmentFile=/etc/default/jobrunner
User=www-data
Group=www-data
SyslogIdentifier=jobrunner
ExecStart=/usr/bin/php /srv/deployment/jobrunner/jobrunner/redisJobRunnerService --config-file=${JOBRUNNER_CONFIG} ${DAEMON_OPTS}
Restart=always

[Install]
WantedBy=multi-user.target

systemd does spawn the service as www-data.

Event Timeline

The jobrunner services writes to stdout/stderr. Apparently with systemd that is caught and send to syslog, then we have some rsyslog configuration from c461d9cf938fac6b1044f2f6dd17a305725401af

modules/mediawiki/files/jobrunner.rsyslog.conf
# rsyslogd(8) configuration file for HHVM.
# This file is managed by Puppet.
:programname, startswith, "jobrunner" /var/log/mediawiki/jobrunner.log
:progrmname, startswith, "jobchron" /var/log/mediawiki/jobrunner.log

progrmname looks like a typo.

Not sure why jobchron is not in a standalone file.

Change 311702 had a related patch set uploaded (by Hashar):
jobrunner: fix rsyslog for jobchron service

https://gerrit.wikimedia.org/r/311702

Change 311719 had a related patch set uploaded (by Hashar):
jobrunner: refactor rsyslog conf and let wikidev read log

https://gerrit.wikimedia.org/r/311719

hashar triaged this task as High priority.Sep 20 2016, 3:18 PM

I have applied the patches to the beta cluster and that makes the log readable to wikidev. Also fix up jobchron which was no more being logged.

The instance is deployment-jobrunner02.deployment-prep.eqiad.wmflabs which has been setup today.

Once we get those logs enabled on production jobrunners, I will be able to investigate an ongoing issue with the jobrunner service (jobs mytsteriously flagged as failling).

Change 311702 merged by Giuseppe Lavagetto:
jobrunner: fix rsyslog for jobchron service

https://gerrit.wikimedia.org/r/311702

Change 311719 merged by ArielGlenn:
jobrunner: refactor rsyslog conf and let wikidev read log

https://gerrit.wikimedia.org/r/311719

hashar added subscribers: ArielGlenn, Joe.

status

Trusty hosts are not impacted, the files are created via upstart redirecting stdout and they are world readable.

For Jessie, @ArielGlenn has reviewed/landed the patches, fix the permissions on all the production Jessie job runners.

So it is essentially solved now. What is left to do is to verify tomorrow that logrotate behave properly and files are still readable by wikidev. Then we can mark this as resolved.

Thank you very much @ArielGlenn and @Joe for the preliminary review

jobchron.log did not rotate but I believe that is due to logrotate only considering them after a couple days. Gotta check again tomorrow.

I also noticed that on Trusty, upstart for jobchron does not redirect stdout/stderr and thus the output end up to a root only file: /var/log/mediawiki/jobchron.log.

Change 312201 had a related patch set uploaded (by Hashar):
jobchron on trusty did not log at the proper place

https://gerrit.wikimedia.org/r/312201

Change 312201 merged by ArielGlenn:
jobchron on trusty did not log at the proper place

https://gerrit.wikimedia.org/r/312201

I have confirmed both Trusty and Jessie properly logrotate both jobchron.log and jobrunner.log

mw1161$ ls -1 /var/log/mediawiki/*.log{,.1}
/var/log/mediawiki/jobchron.log
/var/log/mediawiki/jobchron.log.1
/var/log/mediawiki/jobrunner.log
/var/log/mediawiki/jobrunner.log.1
mw1299$ ls -1 /var/log/mediawiki/*.log{,.1}
/var/log/mediawiki/jobchron.log
/var/log/mediawiki/jobchron.log.1
/var/log/mediawiki/jobrunner.log
/var/log/mediawiki/jobrunner.log.1