Page MenuHomePhabricator

dbstore1001 ipmi issue
Closed, DuplicatePublic

Description

Troubleshooting of dbstore1001 has been split off from the master tracking task T150160, since that task tracks dozens of systems with sub-tasks for individual systems troubleshooting.

This particular host's warranty expires on 2017-02-27, so @Cmjohnson prioritized this for immediate work.

For the record, when reinstalling dbstore1001 (T153768) which is mentioned here: T150160#2951190 as one of the affected hosts, we ran into this issue and tried to troubleshoot the issue.
Along with Chris we tried several things:

  • Cold reset of the idrac
  • Update idrac firmware
  • Update Bios firmware
  • Again cold reset

Nothing worked and we were still getting:

root@neodymium:~# ipmitool -I lanplus -H dbstore1001.mgmt.eqiad.wmnet -U root -E chassis power status
Unable to read password from environment
Password:
Error: Unable to establish IPMI v2 / RMCP+ session

Some debugging showed nothing really relevant:

root@neodymium:~# ipmitool -I lanplus -H 10.65.6.64 -U root -E chassis power status -vvvvvvvvvvv
Unable to read password from environment
Password:
>>    data    : 0x8e 0x04

>> sending packet (23 bytes)
 06 00 ff 07 00 00 00 00 00 00 00 00 00 09 20 18
 c8 81 00 38 8e 04 b5
<< received packet (30 bytes)
 06 00 ff 07 00 00 00 00 00 00 00 00 00 10 81 1c
 63 20 00 38 00 01 86 1c 03 00 00 00 00 02
>> sending packet (48 bytes)
 06 00 ff 07 06 10 00 00 00 00 00 00 00 00 20 00
 00 00 00 00 a4 a3 a2 a0 00 00 00 08 01 00 00 00
 01 00 00 08 01 00 00 00 02 00 00 08 01 00 00 00
<< received packet (52 bytes)
 06 00 ff 07 06 11 00 00 00 00 00 00 00 00 24 00
 00 00 04 00 a4 a3 a2 a0 00 12 00 02 00 00 00 08
 01 00 00 00 01 00 00 08 01 00 00 00 02 00 00 08
 01 00 00 00
<<OPEN SESSION RESPONSE
<<  Message tag                        : 0x00
<<  RMCP+ status                       : no errors
<<  Maximum privilege level            : admin
<<  Console Session ID                 : 0xa0a2a3a4
<<  BMC Session ID                     : 0x02001200
<<  Negotiated authenticatin algorithm : hmac_sha1
<<  Negotiated integrity algorithm     : hmac_sha1_96
<<  Negotiated encryption algorithm    : aes_cbc_128

>> Console generated random number (16 bytes)
 3e dc ec f6 2c 0f f0 02 51 49 3f 8f 11 4a 17 41
>> sending packet (48 bytes)
 06 00 ff 07 06 12 00 00 00 00 00 00 00 00 20 00
 00 00 00 00 00 12 00 02 3e dc ec f6 2c 0f f0 02
 51 49 3f 8f 11 4a 17 41 14 00 00 04 72 6f 6f 74
<< received packet (76 bytes)
 06 00 ff 07 06 13 00 00 00 00 00 00 00 00 3c 00
 00 00 00 00 a4 a3 a2 a0 93 30 ac 7c d9 e0 dc fa
 2d 63 18 73 ca 20 37 f4 44 45 4c 4c 52 00 10 37
 80 43 b4 c0 4f 48 30 32 35 60 68 e0 1d 51 06 e7
 58 46 62 5f e5 ea 87 c1 8b f6 8e a1
<<RAKP 2 MESSAGE
<<  Message tag                   : 0x00
<<  RMCP+ status                  : no errors
<<  Console Session ID            : 0xa0a2a3a4
<<  BMC random number             : 0x9330ac7cd9e0dcfa2d631873ca2037f4
<<  BMC GUID                      : 0x44454c4c520010378043b4c04f483032
<<  Key exchange auth code [sha1] : 0x356068e01d5106e75846625fe5ea87c18bf68ea1

bmc_rand (16 bytes)
 93 30 ac 7c d9 e0 dc fa 2d 63 18 73 ca 20 37 f4
>> rakp2 mac input buffer (62 bytes)
 a4 a3 a2 a0 00 12 00 02 3e dc ec f6 2c 0f f0 02
 51 49 3f 8f 11 4a 17 41 93 30 ac 7c d9 e0 dc fa
 2d 63 18 73 ca 20 37 f4 44 45 4c 4c 52 00 10 37
 80 43 b4 c0 4f 48 30 32 14 04 72 6f 6f 74
>> rakp2 mac key (20 bytes)
 77 72 34 21 45 70 72 75 32 43 35 00 00 00 00 00
 00 00 00 00
>> rakp2 mac as computed by the remote console (20 bytes)
 14 3c 74 84 a0 5f 96 5f 08 0f 2c 2d 55 ea 3e 22
 40 17 8c 3b
Error: Unable to establish IPMI v2 / RMCP+ session

Same command works for another host:

root@neodymium:~# ipmitool -I lanplus -H db1092.mgmt.eqiad.wmnet -U root -E chassis power status
Unable to read password from environment
Password:
Chassis Power is on

However, it works locally:

root@dbstore1001:~# ipmi-chassis --get-chassis-status
System Power                        : on
Power overload                      : false
<snip>

dbstore1001 has the remote issue, opened a ticket with Dell to troubleshoot. Updating F/W to see if that will fix the issue