Page MenuHomePhabricator

db2049 management unable to login via ssh
Closed, ResolvedPublic

Description

Noticed this while trying to debug an "ipmi sensor status" UNKNOWN in icinga, even when supplying the management password I'm still unable to login into db2049 management:

$ ssh -v root@db2049.mgmt.codfw.wmnet -oKexAlgorithms=diffie-hellman-group14-sha1
OpenSSH_7.4p1 Debian-10+deb9u2, OpenSSL 1.0.2l  25 May 2017
debug1: Reading configuration data /home/godog/.ssh/config
debug1: /home/godog/.ssh/config line 1: Applying options for *
debug1: /home/godog/.ssh/config line 8: Deprecated option "useroaming"
debug1: /home/godog/.ssh/config line 93: Applying options for *.wmnet
debug1: /home/godog/.ssh/config line 107: Applying options for *.codfw.wmnet
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Executing proxy command: exec ssh -i ~/.ssh/wmf_prod.pub -W db2049.mgmt.codfw.wmnet:22 -4 filippo@bast3002.wikimedia.org
debug1: identity file /home/godog/.ssh/wmf_prod.pub type 1
debug1: key_load_public: No such file or directory
debug1: identity file /home/godog/.ssh/wmf_prod.pub-cert type -1
debug1: permanently_drop_suid: 1000
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u2
debug1: Remote protocol version 2.0, remote software version mpSSH_0.2.1
debug1: no match: mpSSH_0.2.1
debug1: Authenticating to db2049.mgmt.codfw.wmnet:22 as 'root'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: diffie-hellman-group14-sha1
debug1: kex: host key algorithm: ssh-rsa
debug1: kex: server->client cipher: aes256-ctr MAC: hmac-sha2-256 compression: none
debug1: kex: client->server cipher: aes256-ctr MAC: hmac-sha2-256 compression: none
debug1: sending SSH2_MSG_KEXDH_INIT
debug1: expecting SSH2_MSG_KEXDH_REPLY
debug1: Server host key: ssh-rsa SHA256:16SsmI8gz2VNunNLJign5n2vC9TzgYfIMPZyLob8pFA
DNS lookup error: name does not exist
debug1: Host 'db2049.mgmt.codfw.wmnet' is known and matches the RSA host key.
debug1: Found key in /home/godog/.ssh/known_hosts:223
debug1: rekey after 4294967296 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey after 4294967296 blocks
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: password,publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/godog/.ssh/wmf_prod.pub
debug1: Authentications that can continue: password,publickey
debug1: Next authentication method: password
root@db2049.mgmt.codfw.wmnet's password: 
debug1: Authentications that can continue: password,publickey
Permission denied, please try again.
root@db2049.mgmt.codfw.wmnet's password: 

cc @jcrespo @Marostegui

Details

Related Gerrit Patches:
operations/mediawiki-config : masterdb-codfw.php: Depool db2049

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 16 2018, 11:43 AM

This is a slave, so if @Papaul needs to reboot it to get it fixed, we can easily depool it.

MoritzMuehlenhoff triaged this task as Medium priority.

@Marostegui can you please depool the system for me?
Thanks

Change 414699 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool db2049

https://gerrit.wikimedia.org/r/414699

Change 414699 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool db2049

https://gerrit.wikimedia.org/r/414699

Mentioned in SAL (#wikimedia-operations) [2018-02-26T16:08:46Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Repool db2049 - T187534 (duration: 00m 56s)

 @Papaul db2049 is now off

Mentioned in SAL (#wikimedia-operations) [2018-02-26T16:31:36Z] <papaul> Maintenance: removing Msw-d4-codfw for replacement:T187534

Looks like this is back to life:

root@db2049.mgmt.codfw.wmnet's password:
User:root logged-in to ILO2M245205HN.(10.193.1.99 / FE80::FE15:B4FF:FE92:E428)

iLO Standard 2.50 at  Sep 23 2016
Server Name:
Server Power: On

</>hpiLO->
Papaul closed this task as Resolved.Feb 26 2018, 5:06 PM
  • Power drain server
  • Reset ILO

Server is back up.

Mentioned in SAL (#wikimedia-operations) [2018-02-27T15:20:56Z] <marostegui@tin> Synchronized wmf-config/db-codfw.php: Repool db2049 - T187534 (duration: 00m 57s)