Page MenuHomePhabricator

keyholder has just disarmed everywhere (train blocker)
Closed, ResolvedPublic

Description

On deploy1001:

$ /usr/local/sbin/keyholder status
keyholder-agent: active
keyholder-proxy: active
- The agent has no identities.

That causes the train to no more be deployable :(

Related Objects

Event Timeline

Krenair renamed this task from keyholder is no more armed on deplo1001 (train blocker) to keyholder has just disarmed everywhere (train blocker).May 14 2019, 1:39 PM
hashar renamed this task from keyholder has just disarmed everywhere (train blocker) to keyholder is no more armed on deplo1001 (train blocker).May 14 2019, 1:40 PM
hashar updated the task description. (Show Details)

May 14 14:10:50 <icinga-wm> PROBLEM - Keyholder SSH agent on netmon1002 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:12:18 <icinga-wm> PROBLEM - Keyholder SSH agent on netmon2001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:15:32 <icinga-wm> PROBLEM - Keyholder SSH agent on cumin2001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:16:22 <icinga-wm> RECOVERY - Keyholder SSH agent on netmon1002 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:16:28 <icinga-wm> RECOVERY - Keyholder SSH agent on netmon2001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:18:16 <icinga-wm> RECOVERY - Keyholder SSH agent on cumin2001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder

May 14 14:29:28 <icinga-wm> PROBLEM - Keyholder SSH agent on deploy1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:30:06 <icinga-wm> PROBLEM - Keyholder SSH agent on labpuppetmaster1002 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:31:52 <icinga-wm> PROBLEM - Keyholder SSH agent on deploy2001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:38:04 <icinga-wm> PROBLEM - Keyholder SSH agent on cumin1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder
May 14 14:39:22 <icinga-wm> PROBLEM - Keyholder SSH agent on labpuppetmaster1001 is CRITICAL: CRITICAL: Keyholder is not armed. Run keyholder arm to arm it. https://wikitech.wikimedia.org/wiki/Keyholder

Krenair renamed this task from keyholder is no more armed on deplo1001 (train blocker) to keyholder has just disarmed everywhere (train blocker).May 14 2019, 1:40 PM

Keyholder rearmed on these hosts:

deploy*
cumin*
labpuppetmaster*

Root cause is /usr/local/bin/ssh-agent-proxy changed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/509929 and caused keyholder to reload:

May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) --- /usr/local/bin/ssh-agent-proxy#0112019-03-15 16:56:17.621580293 +0000
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) +++ /tmp/puppet-file20190514-3677-1can9f3#0112019-05-14 13:26:21.031831170 +0000
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) @@ -173,10 +173,12 @@
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)              self.send_message(self.backend, code)
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)          elif code == SSH2_AGENTC_SIGN_REQUEST:
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) -            key_blob, *_ = self.parse_sign_request(message)
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) +            # disable E999 as CI checks with python2 T184435
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) +            key_blob, *_ = self.parse_sign_request(message)  # noqa: E999
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)              key_digest_md5 = hashlib.md5(key_blob).hexdigest()
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) +            # disable E999 as CI checks with python2 T184435
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)              key_digest_sha256 = (b'SHA256' + base64.b64encode(hashlib.sha256(
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) -                key_blob).digest()).rstrip(b'=')).decode('utf-8')
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) +                key_blob).digest()).rstrip(b'=')).decode('utf-8')  # noqa: E999
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)              user, groups = self.get_peer_credentials(self.request)
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)              if groups & self.server.key_perms.get(key_digest_md5, set()).union(
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content)                      self.server.key_perms.get(key_digest_sha256, set())):
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: Computing checksum on file /usr/local/bin/ssh-agent-proxy
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]) Filebucketed /usr/local/bin/ssh-agent-proxy to puppet with sum 596fbc2b8e3eb5d309f06544cb1f061a
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]/content) content changed '{md5}596fbc2b8e3eb5d309f06544cb1f061a' to '{md5}1c953717948198d5329836
0a9fa6c61f'
May 14 13:26:21 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/File[/usr/local/bin/ssh-agent-proxy]) Scheduling refresh of Service[keyholder-agent]
May 14 13:26:31 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/Systemd::Service[keyholder-agent]/Service[keyholder-agent]) Triggered 'refresh' from 1 events
May 14 13:26:31 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/Systemd::Service[keyholder-proxy]/Service[keyholder-proxy]/ensure) ensure changed 'stopped' to 'running'
May 14 13:26:31 labpuppetmaster1002 puppet-agent[3677]: (/Stage[main]/Keyholder/Systemd::Service[keyholder-proxy]/Service[keyholder-proxy]) Unscheduling refresh on Service[keyholder-proxy]
May 14 13:26:32 labpuppetmaster1002 puppet-agent[3677]: Applied catalog in 21.21 seconds

Confirm that is responsible for the faliure, the other re-arms where done by myself

fgiunchedi claimed this task.

Resolving because we're back, although feel free to reopen if we're missing something.

Thank you for the quick fix!

As jbond stated, that has been caused by the renaming of ssh-agent-proxy source: https://gerrit.wikimedia.org/r/c/operations/puppet/+/509929/6/modules/keyholder/manifests/init.pp