Page MenuHomePhabricator

acme-chief-cert-sync failing on tools-acme-chief-01
Closed, ResolvedPublic

Description

Happened to notice this while poking Puppet failures on Prometheus - acme-chief is failing to sync certs to the secondary host on Toolforge:

Aug 08 09:00:01 tools-acme-chief-01 systemd[1]: Started Sync acme-chief certificates.
Aug 08 09:00:01 tools-acme-chief-01 acme-chief-certs-sync[15539]: Could not create directory '/nonexistent/.ssh'.
Aug 08 09:00:01 tools-acme-chief-01 acme-chief-certs-sync[15539]: acme-chief@tools-acme-chief-02.tools.eqiad.wmflabs: Permission denied (publickey,hostbased).
Aug 08 09:00:01 tools-acme-chief-01 acme-chief-certs-sync[15539]: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
Aug 08 09:00:01 tools-acme-chief-01 acme-chief-certs-sync[15539]: rsync error: unexplained error (code 255) at io.c(235) [sender=3.1.3]
Aug 08 09:00:01 tools-acme-chief-01 systemd[1]: acme-chief-certs-sync.service: Main process exited, code=exited, status=255/EXCEPTION

Event Timeline

Keyholder is failing to use authdns_acmechief ssh key, which hasn't been generated on Toolforge private puppet repository. The name suggests that key is used in wikiprod for dealing with the challenges on gdnsd. Is the same key used for cert sync too?

aborrero added subscribers: aborrero, Vgutierrez, rook, Andrew.

If this is preventing cert renewal, we have until Mon, 28 Mar 2022 19:28:16 GMT before the toolforge.org cert expires (see T301117)

nope.. it doesn't stop certs from being renewed. tools-acme-chief-01 needs a SSH key that grants access to tools-acme-chief-02 to be able to provide HA

the SSH key is managed by keyholder.. so maybe the key is there but keyholder needs to be armed (keyholder arm) on that instance? more details are available on wikitech: https://wikitech.wikimedia.org/wiki/Keyholder

Mentioned in SAL (#wikimedia-cloud) [2022-02-07T17:37:22Z] <taavi> generated authdns_acmechief ssh key and stored password in a text file in local labs/private repository (T288406)

I generated that ssh key, now it's failing with:

Feb 07 17:35:49 tools-acme-chief-01 systemd[1]: Started Sync acme-chief certificates.
Feb 07 17:35:49 tools-acme-chief-01 acme-chief-certs-sync[1123]: Could not create directory '/nonexistent/.ssh'.
Feb 07 17:35:49 tools-acme-chief-01 acme-chief-certs-sync[1123]: Connection closed by 172.16.0.18 port 22
Feb 07 17:35:49 tools-acme-chief-01 systemd[1]: acme-chief-certs-sync.service: Main process exited, code=exited, status=255/EXCEPTION
Feb 07 17:35:49 tools-acme-chief-01 acme-chief-certs-sync[1123]: rsync: connection unexpectedly closed (0 bytes received so far) [sender]
Feb 07 17:35:49 tools-acme-chief-01 acme-chief-certs-sync[1123]: rsync error: unexplained error (code 255) at io.c(235) [sender=3.1.3]
Feb 07 17:35:49 tools-acme-chief-01 systemd[1]: acme-chief-certs-sync.service: Failed with result 'exit-code'.

@Majavah you can debug the keyholder/SSH setup using SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -vv 172.16.0.18

@Majavah you can debug the keyholder/SSH setup using SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -vv 172.16.0.18

taavi@tools-acme-chief-01:~ $ sudo sudo -u acme-chief SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -vv 172.16.0.18
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1d  10 Sep 2019
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug2: resolve_canonicalize: hostname 172.16.0.18 is address
debug2: ssh_connect_direct
debug1: Connecting to 172.16.0.18 [172.16.0.18] port 22.
debug1: Connection established.
debug1: SELinux support disabled
Could not create directory '/nonexistent/.ssh'.
debug1: identity file /nonexistent/.ssh/id_rsa type -1
debug1: identity file /nonexistent/.ssh/id_rsa-cert type -1
debug1: identity file /nonexistent/.ssh/id_dsa type -1
debug1: identity file /nonexistent/.ssh/id_dsa-cert type -1
debug1: identity file /nonexistent/.ssh/id_ecdsa type -1
debug1: identity file /nonexistent/.ssh/id_ecdsa-cert type -1
debug1: identity file /nonexistent/.ssh/id_ed25519 type -1
debug1: identity file /nonexistent/.ssh/id_ed25519-cert type -1
debug1: identity file /nonexistent/.ssh/id_xmss type -1
debug1: identity file /nonexistent/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_7.9p1 Debian-10+deb10u2
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.9p1 Debian-10+deb10u2
debug1: match: OpenSSH_7.9p1 Debian-10+deb10u2 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to 172.16.0.18:22 as 'acme-chief'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ext-info-c
debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-ed25519-cert-v01@openssh.com,ssh-ed25519
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos: 
debug2: languages stoc: 
debug2: first_kex_follows 0 
debug2: reserved 0 
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1
debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com
debug2: compression stoc: none,zlib@openssh.com
debug2: languages ctos: 
debug2: languages stoc: 
debug2: first_kex_follows 0 
debug2: reserved 0 
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:+bkavHRPGw4Ap+97g7mvKHTraR05DmYz1k7BvWuXTdw
debug1: Host '172.16.0.18' is known and matches the ECDSA host key.
debug1: Found key in /etc/ssh/ssh_known_hosts:4
debug2: set_newkeys: mode 1
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug2: set_newkeys: mode 0
debug1: rekey after 134217728 blocks
debug1: Will attempt key: root@tools-puppetmaster-02 ED25519 SHA256:94BqF5FP9A0oclS4Mxx4ONUqhmIODtbJ2Bg1rnJklKE agent
debug1: Will attempt key: /nonexistent/.ssh/id_rsa 
debug1: Will attempt key: /nonexistent/.ssh/id_dsa 
debug1: Will attempt key: /nonexistent/.ssh/id_ecdsa 
debug1: Will attempt key: /nonexistent/.ssh/id_ed25519 
debug1: Will attempt key: /nonexistent/.ssh/id_xmss 
debug2: pubkey_prepare: done
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,hostbased
debug1: Next authentication method: publickey
debug1: Offering public key: root@tools-puppetmaster-02 ED25519 SHA256:94BqF5FP9A0oclS4Mxx4ONUqhmIODtbJ2Bg1rnJklKE agent
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: root@tools-puppetmaster-02 ED25519 SHA256:94BqF5FP9A0oclS4Mxx4ONUqhmIODtbJ2Bg1rnJklKE agent
Connection closed by 172.16.0.18 port 22

Change 961334 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::instance: decrease priority of access rule

https://gerrit.wikimedia.org/r/961334

Change 961334 merged by Majavah:

[operations/puppet@production] P:toolforge::instance: decrease priority of access rule

https://gerrit.wikimedia.org/r/961334