Page MenuHomePhabricator

mw2140 unresponsive, mgmt not accessible
Closed, ResolvedPublic

Description

mw2140 didn't come up up after a reboot and the mgmt is inaccessible, please investigate. Similar issue happened in May 2017 in T166328, BTW. The warranty expires on 2018-01-19, so if something is broken we need to be quick to have it replaced :-)

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptJan 12 2018, 10:05 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Papaul triaged this task as Medium priority.Jan 15 2018, 6:36 AM
RobH raised the priority of this task from Medium to High.Jan 16 2018, 4:26 PM
RobH added a subscriber: RobH.

Raising to high priority, since the warranty expires in days.

Did the same process like in T166328 . The IDRAC is back up for now. I also spoke with Dell, there will be sending me a new main board.

Hi Papaul

Thank you for contacting Dell EMC Basic Server Support.

This mail is with reference to the problem (iDRAC not responding)that was reported on your Dell PowerEdge (R420)

Service Tag : B14G842

Service request number : 959474874

Papaul added a subscriber: Papaul.

Main board replacement complete

  • Test ssh connection ( racadm power commands)
  • clear log
  • Update IDRAC firmware from version 2.21 to version 2.50

@MoritzMuehlenhoff it is all yours.

Thanks.

RobH added a subscriber: MoritzAccountTest.

Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts:

['mw2140.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201801191430_elukey_3742.log.

Completed auto-reimage of hosts:

['mw2140.codfw.wmnet']

Of which those FAILED:

['mw2140.codfw.wmnet']

Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts:

['mw2140.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201801191432_elukey_4916.log.

Change 405291 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] Update mw2140 MAC address after mainboard replacement

https://gerrit.wikimedia.org/r/405291

Completed auto-reimage of hosts:

['mw2140.codfw.wmnet']

Of which those FAILED:

['mw2140.codfw.wmnet']

@Papaul: ethtool shows "Link detected: no" for both network interfaces, the next time you're in the DC could you please check the cabling? (Not time-critical)

Change 405291 merged by Elukey:
[operations/puppet@production] Update mw2140 MAC address after mainboard replacement

https://gerrit.wikimedia.org/r/405291

Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts:

['mw2140.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201801191614_elukey_18281.log.

Completed auto-reimage of hosts:

['mw2140.codfw.wmnet']

and were ALL successful.

elukey closed this task as Resolved.Jan 19 2018, 5:30 PM

Pooled and working correctly, closing!

238482n375 removed elukey as the assignee of this task.Jun 15 2018, 8:03 AM
238482n375 lowered the priority of this task from High to Lowest.
238482n375 moved this task from Next Up to In Code Review on the Analytics-Kanban board.
238482n375 edited subscribers, added: elukey, 238482n375; removed: Aklapper.

SG9tZVBoYWJyaWNhdG9yCk5vIG1lc3NhZ2VzLiBObyBub3RpZmljYXRpb25zLgoKICAgIFNlYXJjaAoKQ3JlYXRlIFRhc2sKTWFuaXBoZXN0ClQxOTcyODEKRml4IGZhaWxpbmcgd2VicmVxdWVzdCBob3VycyAodXBsb2FkIGFuZCB0ZXh0IDtyBDQy1CWS1TQSC3IEdQTApZb3VyIGJyb3dzZXIgdGltZXpvbmUgc2V0dGluZyBkaWZmZXJzIGZyb20gdGhlIHRpbWV6b25lIHNldHRpbmcgaW4geW91ciBwcm9maWxlLCBjbGljayB0byByZWNvbmNpbGUu

Aklapper assigned this task to elukey.Jun 15 2018, 1:07 PM