Page MenuHomePhabricator

Unclean stop of jobrunner service via puppet
Closed, DeclinedPublic

Description

Today I tried a roll-restart of jobrunner for T123728 with systemctl restart jobrunner via salt and when puppet tried to subsequently stop jobrunner it resulted in failure and triggering of icinga alert for 'systemd degraded'

root@mw2247:~# systemctl status jobrunner
● jobrunner.service - "Mediawiki job queue runner loop"
   Loaded: loaded (/lib/systemd/system/jobrunner.service; disabled)
   Active: failed (Result: exit-code) since Thu 2017-02-16 09:13:43 UTC; 4min 4s ago
  Process: 112358 ExecStart=/usr/bin/php /srv/deployment/jobrunner/jobrunner/redisJobRunnerService --config-file=${JOBRUNNER_CONFIG} ${DAEMON_OPTS} (code=exited, status=143)
 Main PID: 112358 (code=exited, status=143)

Feb 16 09:13:43 mw2247 jobrunner[112358]: [Thu Feb 16 09:13:43 2017] [hphp] [112358:7f008fff16c0:0:000026] [] LightProc...n pipe
Feb 16 09:13:43 mw2247 jobrunner[112358]: Sending SIGTERM to 6862.
Feb 16 09:13:43 mw2247 jobrunner[112358]: [Thu Feb 16 09:13:43 2017] [hphp] [112358:7f008fff16c0:0:000027] [] LightProc...n pipe
Feb 16 09:13:43 mw2247 jobrunner[112358]: [Thu Feb 16 09:13:43 2017] [hphp] [112358:7f008fff16c0:0:000028] [] LightProc...n pipe
Feb 16 09:13:43 mw2247 jobrunner[112358]: [Thu Feb 16 09:13:43 2017] [hphp] [112358:7f008fff16c0:0:000029] [] LightProc...n pipe
Feb 16 09:13:43 mw2247 jobrunner[112358]: [Thu Feb 16 09:13:43 2017] [hphp] [112358:7f008fff16c0:0:000030] [] LightProc...n pipe
Feb 16 09:13:43 mw2247 jobrunner[112358]: [Thu Feb 16 09:13:43 2017] [hphp] [112358:7f008fff16c0:0:000031] [] LightProc...n pipe
Feb 16 09:13:43 mw2247 systemd[1]: jobrunner.service: main process exited, code=exited, status=143/n/a
Feb 16 09:13:43 mw2247 systemd[1]: Stopped "Mediawiki job queue runner loop".
Feb 16 09:13:43 mw2247 systemd[1]: Unit jobrunner.service entered failed state.
Hint: Some lines were ellipsized, use -l to show in full.

Event Timeline

fgiunchedi renamed this task from 'systemctl restart jobrunner' broken via salt to Unclean stop of jobrunner service via puppet.Feb 16 2017, 9:55 AM
fgiunchedi updated the task description. (Show Details)

The cure for the moment is to 'systemctl reset-failed jobrunner' to restore non-degraded systemd state

Ottomata triaged this task as Medium priority.Mar 6 2017, 7:45 PM
Krinkle subscribed.

jobchron mediawiki/services/jobrunner are no longer used in production.