Page MenuHomePhabricator

tmpreaper doesn't play along with PrivateTmp systemd units
Closed, ResolvedPublic

Description

Sometimes the new stretch videoscalers fails the logrotate daily workflow due to this:

Jan 17 06:25:02 mw1259 systemd[1]: Reloading The Apache HTTP Server.
Jan 17 06:25:02 mw1259 systemd[3656]: apache2.service: Failed at step NAMESPACE spawning /usr/sbin/apachectl: No such file or directory
Jan 17 06:25:02 mw1259 systemd[1]: apache2.service: Control process exited, code=exited status=226
Jan 17 06:25:02 mw1259 systemd[1]: Reload failed for The Apache HTTP Server.

And we get this nice email:

/etc/cron.daily/logrotate:
Job for apache2.service failed because the control process exited with error code.
See "systemctl status apache2.service" and "journalctl -xe" for details.
error: error running shared postrotate script for '/var/log/apache2/*.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1

Event Timeline

elukey created this task.Jan 18 2018, 10:17 AM
elukey triaged this task as Normal priority.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 18 2018, 10:17 AM
elukey moved this task from Backlog to Ops Backlog on the User-Elukey board.Feb 16 2018, 12:01 PM

Comparing jessie/stretch units:

elukey@mw1234:~$ sudo systemctl cat apache2
# /run/systemd/generator.late/apache2.service
# Automatically generated by systemd-sysv-generator

[Unit]
SourcePath=/etc/init.d/apache2
Description=LSB: Apache2 web server
Before=runlevel2.target runlevel3.target runlevel4.target runlevel5.target shutdown.target
After=local-fs.target remote-fs.target network-online.target systemd-journald-dev-log.socket nss-lookup.target
Wants=network-online.target
Conflicts=shutdown.target

[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
SysVStartPriority=2
ExecStart=/etc/init.d/apache2 start
ExecStop=/etc/init.d/apache2 stop
ExecReload=/etc/init.d/apache2 reload

# /lib/systemd/system/apache2.service.d/forking.conf
[Service]
Type=forking
RemainAfterExit=no
root@mw1293:/var/log/apache2# systemctl cat apache2
# /lib/systemd/system/apache2.service
[Unit]
Description=The Apache HTTP Server
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=forking
Environment=APACHE_STARTED_BY_SYSTEMD=true
ExecStart=/usr/sbin/apachectl start
ExecStop=/usr/sbin/apachectl stop
ExecReload=/usr/sbin/apachectl graceful
PrivateTmp=true
Restart=on-abort

[Install]
WantedBy=multi-user.target

From all the reports that I am reading the culprit might be PrivateTmp=true. There was a report to systemd devs that is similar to what we are seeing (https://github.com/systemd/systemd/issues/6212) but not really the same. We do mount /tmp separately though (no idea why).

Could it be possible that a reload does not work with PrivateTmp?

Another rather confusing thing that I've noticed while checking logs on videoscalers (on stretch) is that /var/log/apache2/jobqueue-access.log.1 keeps getting updated instead of jobqueue-access.log (that is empty), probably because of the failure while logrotating. This is confusing because if one is not aware of this issue it might appear, at first check, that httpd is not logging anything.

I ran some tests and can confirm that PrivateTmp=true is the culprit. I haven't yet figured why it breaks, but i'll have a closer look.

MoritzMuehlenhoff renamed this task from Sporadic logrotate issue for stretch mediawiki appservers to Apache reload fails on stretch-based app servers.Mar 29 2018, 3:26 PM

Change 423882 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Disable PrivateTmp via systemd override for stretch-based mediawiki setups

https://gerrit.wikimedia.org/r/423882

Change 425509 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Disable PrivateTmp via systemd override for stretch-based app servers

https://gerrit.wikimedia.org/r/425509

Change 425509 merged by Muehlenhoff:
[operations/puppet@production] Disable PrivateTmp via systemd override for stretch-based app servers

https://gerrit.wikimedia.org/r/425509

Change 423882 merged by Muehlenhoff:
[operations/puppet@production] Disable PrivateTmp via systemd override for video scalers

https://gerrit.wikimedia.org/r/423882

Some initial tests suggest that amending the TMPREAPER_PROTECT_EXTRA setting shipped in tmpreaper might fix this, but I need to run some more exhaustive tests to confirm.

I've posted a summary to the Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881725#65

We could add a workaround to our tmpreaper configuration to bypass the problem of failing apache reload, but the problem remains that tmpreaper is unable to properly cleanup files in a PrivateTmp-enabled namespace, i.e. that with the workaround in place it would effectively not remove the temp files.

Possible fixes:

  • Add support to reap tmp files in namespaced tmp directories
  • Implement a custom mechanism to remove superfluous temporary files (or wait for something like that to be implemented in systemd)
MoritzMuehlenhoff renamed this task from Apache reload fails on stretch-based app servers to tmpreaper doesn't play along with PrivateTmp systemd units.Apr 27 2018, 2:18 PM
MoritzMuehlenhoff removed MoritzMuehlenhoff as the assignee of this task.
GTirloni added a subscriber: GTirloni.

We're facing the same issue on labweb100[12]. Would it be reasonable to modify tmpreaper::reap to include /tmp/systemd-private*/* as a protected pattern for all the infrastructure?

tmpreaper::reap doesn't seem to be used at all (at least in production)? I think we could either extend the tmpreaper.conf and pass it as TMPREAPER_PROTECT_EXTRA or via the daily cron (as proposed in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881725#65)

Or we rather investigate an alternative solution; tmpreaper seems dead upstream, it wasn't changed in Debian for almost a decade, there's probably a few alternative available by now.

Joe added a comment.Tue, Feb 12, 6:55 AM

FYI, I've merged a change yesterday that should've fixed the problem from now on.

Joe closed this task as Resolved.Tue, Feb 12, 6:58 AM
Joe claimed this task.

Change 489982 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove apache systemd override now that tmpreaper is fixed

https://gerrit.wikimedia.org/r/489982