Page MenuHomePhabricator

toolsbeta remote crontab is broken
Closed, ResolvedPublic

Description

Not sure when it happened or why yet, but remote crontab has stopped working on toolsbeta.

toolsbeta.test@toolsbeta-sgebastion-04:~$ crontab -l
Authentication failed.
/usr/local/bin/crontab: unable to execute remote crontab command

It's useful for testing, so we should fix it.

Event Timeline

Authentication here should happen via ssh's HostbasedAuthentication.

What the client side sees when attempting auth:

$ ssh toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
$ sudo become test
$ /usr/bin/ssh -vvv $(cat /etc/toollabs-cronhost) /usr/bin/crontab -l
OpenSSH_7.4p1 Debian-10+deb9u7, OpenSSL 1.0.2u  20 Dec 2019
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 20: Applying options for *
debug2: resolving "toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs" port 22
debug2: ssh_connect_direct: needpriv 0
debug1: Connecting to toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs [172.16.1.196] port 22.
debug1: Connection established.
debug1: key_load_private_cert: Permission denied
debug1: key_load_private_cert: Permission denied
debug1: key_load_private_cert: Permission denied
debug1: key_load_private_cert: Permission denied
debug1: key_load_private_type: Permission denied
debug1: key_load_private_type: Permission denied
debug1: key_load_private_type: Permission denied
debug1: key_load_private_type: Permission denied
debug1: key_load_cert: No such file or directory
debug1: key_load_cert: No such file or directory
debug1: key_load_cert: No such file or directory
debug1: key_load_cert: No such file or directory
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /data/project/test/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u6
debug1: match: OpenSSH_7.4p1 Debian-10+deb9u6 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs:22 as 'toolsbeta.test'
debug3: hostkeys_foreach: reading file "/data/project/test/.ssh/known_hosts"
debug3: hostkeys_foreach: reading file "/etc/ssh/ssh_known_hosts"
debug3: record_hostkey: found key type ECDSA in file /etc/ssh/ssh_known_hosts:13
debug3: load_hostkeys: loaded 1 keys from toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs
debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ext-info-c
debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1
debug2: host key algorithms: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com
debug2: compression stoc: none,zlib@openssh.com
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:6eb22sS6V7GCE2FGvHbojFHFXr+hmofr+ZIwnoFGkjY
debug3: hostkeys_foreach: reading file "/data/project/test/.ssh/known_hosts"
debug3: hostkeys_foreach: reading file "/etc/ssh/ssh_known_hosts"
debug3: record_hostkey: found key type ECDSA in file /etc/ssh/ssh_known_hosts:13
debug3: load_hostkeys: loaded 1 keys from toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs
debug3: hostkeys_foreach: reading file "/data/project/test/.ssh/known_hosts"
debug3: hostkeys_foreach: reading file "/etc/ssh/ssh_known_hosts"
debug3: record_hostkey: found key type ECDSA in file /etc/ssh/ssh_known_hosts:13
debug3: load_hostkeys: loaded 1 keys from 172.16.1.196
debug1: Host 'toolsbeta-sgecron-01.toolsbeta.eqiad.wmflabs' is known and matches the ECDSA host key.
debug1: Found key in /etc/ssh/ssh_known_hosts:13
debug3: send packet: type 21
debug2: set_newkeys: mode 1
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug3: receive packet: type 21
debug1: SSH2_MSG_NEWKEYS received
debug2: set_newkeys: mode 0
debug1: rekey after 134217728 blocks
debug2: key: /data/project/test/.ssh/id_rsa ((nil))
debug2: key: /data/project/test/.ssh/id_dsa ((nil))
debug2: key: /data/project/test/.ssh/id_ecdsa ((nil))
debug2: key: /data/project/test/.ssh/id_ed25519 ((nil))
debug3: send packet: type 5
debug3: receive packet: type 7
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50
debug3: receive packet: type 51
debug1: Authentications that can continue: publickey,hostbased
debug3: start over, passed a different list publickey,hostbased
debug3: preferred gssapi-keyex,gssapi-with-mic,hostbased,publickey,keyboard-interactive,password
debug3: authmethod_lookup hostbased
debug3: remaining preferred: publickey,keyboard-interactive,password
debug3: authmethod_is_enabled hostbased
debug1: Next authentication method: hostbased
debug3: userauth_hostbased: trying key type ecdsa-sha2-nistp256-cert-v01@openssh.com
debug3: userauth_hostbased: trying key type ecdsa-sha2-nistp384-cert-v01@openssh.com
debug3: userauth_hostbased: trying key type ecdsa-sha2-nistp521-cert-v01@openssh.com
debug3: userauth_hostbased: trying key type ssh-ed25519-cert-v01@openssh.com
debug3: userauth_hostbased: trying key type ssh-rsa-cert-v01@openssh.com
debug3: userauth_hostbased: trying key type ecdsa-sha2-nistp256
debug1: userauth_hostbased: trying hostkey ecdsa-sha2-nistp256 SHA256:NEn7EhIHnf1HJkVwZvj89NGwlM28qROmLw8pEZNgtEU
debug2: userauth_hostbased: chost toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs.
debug3: ssh_msg_send: type 2
debug3: ssh_msg_recv entering
debug1: permanently_drop_suid: 51595
debug3: ssh_keysign: [child] pid=27387, exec /usr/lib/openssh/ssh-keysign
debug3: send packet: type 50
debug2: we sent a hostbased packet, wait for reply
Authentication failed.

What the server side logs as a result:

/var/log/auth.log
Jul 20 19:44:10 toolsbeta-sgecron-01 sshd[18638]: Connection from 172.16.3.240 port 54326 on 172.16.1.196 port 22
Jul 20 19:44:11 toolsbeta-sgecron-01 sshd[18638]: Accepted ECDSA public key SHA256:NEn7EhIHnf1HJkVwZvj89NGwlM28qROmLw8pEZNgtEU from toolsbeta.test@toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 19:44:11 toolsbeta-sgecron-01 sshd[18638]: pam_access(sshd:account): access denied for user `toolsbeta.test' from `toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs'
Jul 20 19:44:11 toolsbeta-sgecron-01 sshd[18638]: Failed hostbased for toolsbeta.test from 172.16.3.240 port 54326 ssh2: ECDSA SHA256:NEn7EhIHnf1HJkVwZvj89NGwlM28qROmLw8pEZNgtEU, client user "toolsbeta.test", client host "toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs"
Jul 20 19:44:11 toolsbeta-sgecron-01 sshd[18638]: fatal: Access denied for user toolsbeta.test by PAM account configuration [preauth]

On toolsbeta-sgecron-01:

$ grep toolsbeta-sgebastion-04 /etc/ssh/ssh_known_hosts
toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs,toolsbeta-sgebastion-04,172.16.3.240 ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBHtFwmlaC99dI8VAhKXsW4xgZFKyS3SRRVMEe4HdzQXHwPzFbbOKGiKU5hDvPm8w1obY/f4844iGTVOHEAXH7ks=

The key presented by toolsbeta-sgebastion-04 to toolsbeta-sgecron-01 is /etc/ssh/ssh_host_ecdsa_key.pub (create date 2018-11-27). This key has the fingerprint SHA256:NEn7EhIHnf1HJkVwZvj89NGwlM28qROmLw8pEZNgtEU which matches the ssh -vvv client output. It's raw value matches the /etc/ssh/ssh_known_hosts entry. The logged client hostname of "toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs" matches with an entry in /etc/ssh/shosts.equiv on the target server.

So... the key looks right on both sides and the client hostname looks right on the server side.

In that auth.log output there is a line that looks a lot like pam denying the connection:

pam_access(sshd:account): access denied for user `toolsbeta.test' from `toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs'

Let's check for a missing grant in /etc/security/access.conf first:

$ host toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs has address 172.16.3.240
$ grep 172.16.3.240 /etc/security/access.conf
+:ALL: 172.16.3.112 172.16.4.58 172.16.4.82 172.16.3.240 172.16.3.240

That looks correct too. Let's configure pam to be more verbose and see if it will tell us which auth check is failing.

Jul 20 20:37:05 toolsbeta-sgecron-01 sshd[23518]: Connection from 172.16.3.240 port 54412 on 172.16.1.196 port 22
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: Accepted ECDSA public key SHA256:NEn7EhIHnf1HJkVwZvj89NGwlM28qROmLw8pEZNgtEU from toolsbeta.test@toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): login_access: user=toolsbeta.test, from=toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs, file=/etc/security/access.conf
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): line 1: + : ALL : LOCAL
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): list_match: list=ALL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match: tok=ALL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): string_match: tok=ALL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match=2, "toolsbeta.test"
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): list_match: list=LOCAL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): from_match: tok=LOCAL, item=toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): string_match: tok=LOCAL, item=toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): network_netmask_match: tok=LOCAL, item=toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): from_match=0, "toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs"
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): line 2: +  :  clushuser  :  tools-clushmaster-02.tools.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): list_match: list= clushuser , item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match: tok=clushuser, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): string_match: tok=clushuser, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match=0, "toolsbeta.test"
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): line 3: - : ALL EXCEPT (toolsbeta.admin) root : ALL
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): list_match: list=ALL EXCEPT (toolsbeta.admin) root, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match: tok=ALL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): string_match: tok=ALL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match: tok=(toolsbeta.admin), item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): group_match: grp=, user=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match: tok=root, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): string_match: tok=root, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): user_match=1, "toolsbeta.test"
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): list_match: list=ALL, item=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): from_match: tok=ALL, item=toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): string_match: tok=ALL, item=toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): from_match=2, "toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs"
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_access(sshd:account): access denied for user `toolsbeta.test' from `toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs'
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_ldap(sshd:account): nslcd authorisation; user=toolsbeta.test
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: pam_ldap(sshd:account): authorization succeeded
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: Failed hostbased for toolsbeta.test from 172.16.3.240 port 54412 ssh2: ECDSA SHA256:NEn7EhIHnf1HJkVwZvj89NGwlM28qROmLw8pEZNgtEU, client user "toolsbeta.test", client host "toolsbeta-sgebastion-04.toolsbeta.eqiad.wmflabs"
Jul 20 20:37:06 toolsbeta-sgecron-01 sshd[23518]: fatal: Access denied for user toolsbeta.test by PAM account configuration [preauth]

The failure is in the handling of line 3: - : ALL EXCEPT (toolsbeta.admin) root : ALL. This rule in /etc/security/access.conf says that the only toolsbeta.admin and root users can authenticate. But... why is that line there? There is no similar line in /etc/security/access.conf on tools-sgecron-01.tools.eqiad.wmflabs.

It turns out that the offending line in /etc/security/access.conf comes from /etc/security/access.conf.d/50-labs_admin_only. That rule comes from profile::toolforge::infrastructure which really should not be applied to a cron host. The file was provisioned on 2021-03-18.

Does it come back if I rm it and force a puppet run? Yes, it does. :(

Poking around in Horizon reveals that profile::toolforge::infrastructure is applied at the project puppet level. Let's see when that happened:

$ cd ~/projects/wmf/cloud/instance-puppet/toolsbeta
$ git blame _.roles
00fbad28 (andrew 2019-11-05 15:39:07 +0000 1) - role::aptly::client
00fbad28 (andrew 2019-11-05 15:39:07 +0000 2) - profile::toolforge::clush::target
c67752e0 (bstorm 2021-03-18 18:55:33 +0000 3) - profile::toolforge::infrastructure

Bam! That matches the prior timestamp too. Now I guess the question is how to add profile::toolforge::infrastructure to all the things that need it without also adding it to the bastion and cron hosts in that project.

Mentioned in SAL (#wikimedia-cloud) [2021-07-20T21:18:36Z] <bstorm> applied login_server: true to toolsbeta-sgecron-01 T287037

Bstorm claimed this task.

Works now! Anything that needs ssh needs the hiera key of login_server: true, which was not added to toolsbeta since most people don't have access there anyway.

Thanks!