Page MenuHomePhabricator

Some Trebuchet minions are not responding to salt call when deploying jobrunner
Closed, ResolvedPublic

Description

On tin, the list of deploy target for jobrunner was quite out of date (redis key "deploy:jobrunner/jobrunner:minions"). I have cleaned up a bunch of target which have been decommissioned for the last few months.

There are 7 which claims to be role::mediawiki::jobrunner but that Trebuchet/salt can not reach for some reason:

Repo: jobrunner/jobrunner
Tag: jobrunner/jobrunner-sync-20160922-083924

34/41 minions completed fetch

Details:

mw1304.eqiad.wmnet: 
        fetch status: None [started: 0 mins ago, last-return: None mins ago]
mw1306.eqiad.wmnet: 
        fetch status: None [started: 0 mins ago, last-return: None mins ago]
mw2080.codfw.wmnet: 
        fetch status: 0 [started: 0 mins ago, last-return: 838 mins ago]
mw2083.codfw.wmnet: 
        fetch status: 0 [started: 0 mins ago, last-return: 838 mins ago]
mw2084.codfw.wmnet: 
        fetch status: 0 [started: 0 mins ago, last-return: 838 mins ago]
mw2085.codfw.wmnet: 
        fetch status: 0 [started: 0 mins ago, last-return: 838 mins ago]
mw2162.codfw.wmnet: 
        fetch status: None [started: 0 mins ago, last-return: None mins ago]

Event Timeline

I am giving up trying to deploy jobrunner update. The whole Trebuchet is completely broken and there is ZERO way for non-root to investigate.

So I would babysit/deploy jobrunner when it is migrated to scap3 which is T129148

hashar added subscribers: Volans, Joe.

So @Volans found:

NameError: global name '__pillar__' is not defined
  File "/var/cache/salt/minion/extmods/returners/deploy_redis.py", line 53, in returner

@Joe has restarted the salt-minion service on one of the target and it fixed it ! :]

So @Volans found:

NameError: global name '__pillar__' is not defined
  File "/var/cache/salt/minion/extmods/returners/deploy_redis.py", line 53, in returner

For reference the complete error log and stacktrace is:

2016-09-22 08:09:58,443 [salt.minion      ][ERROR   ] The return failed for job 20160922080958303913 global name '__pillar__' is not defined
2016-09-22 08:09:58,443 [salt.minion      ][ERROR   ] Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1089, in _thread_return
    )](ret)
  File "/var/cache/salt/minion/extmods/returners/deploy_redis.py", line 53, in returner
    serv = _get_serv()
  File "/var/cache/salt/minion/extmods/returners/deploy_redis.py", line 28, in _get_serv
    deployment_config = __pillar__.get('deployment_config')
NameError: global name '__pillar__' is not defined
hashar assigned this task to Joe.

Giuseppe has done all the magic restart of salt minion and it is all fine now! Huge thanks!

I cleaned the list of target in redis and we are all set.