When I switched over the deployment server, puppet ran
grain-ensure set trebuchet_master mira.codfw.wmnet
this worked fine on ~ 60% of the hosts, while on the others (independently of the OS version) this crashed the salt-minion.
This seems to be caused by some race condition; in the minion logs I find:
2016-01-25 10:10:58,656 [salt.log.setup ][ERROR ] An un-handled exception was caught by salt's global exception handler: TypeError: string indices must be integers, not str Traceback (most recent call last): File "/usr/bin/salt-minion", line 14, in <module> salt_minion() File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 57, in salt_minion minion.start() File "/usr/lib/python2.7/dist-packages/salt/__init__.py", line 264, in start self.minion.tune_in() File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 558, in tune_in minion['minion'].pillar_refresh() File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1407, in pillar_refresh self.opts['environment'], File "/usr/lib/python2.7/dist-packages/salt/pillar/__init__.py", line 91, in compile_pillar ret_pillar = self.sreq.crypted_transfer_decode_dictentry(load, dictkey='pillar', tries=3, timeout=7200) File "/usr/lib/python2.7/dist-packages/salt/transport/__init__.py", line 243, in crypted_transfer_decode_dictentry aes = key.private_decrypt(ret['key'], 4) TypeError: string indices must be integers, not str
which honestly doesn't leave me any clue.
This seems serious enough to be investigated further though.