Page MenuHomePhabricator

cp3053 is unreachable
Closed, ResolvedPublic

Description

cp3053 went offline on Saturday 23rd at 18:31 UTC, this seems like another occurrence of T238305.

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2019-11-25T02:59:24Z] <vgutierrez> depooling & power-cycling cp3053 - T239041

Vgutierrez triaged this task as Medium priority.Nov 25 2019, 3:03 AM
Vgutierrez updated the task description. (Show Details)
Vgutierrez moved this task from Triage to Hardware on the Traffic board.
Vgutierrez claimed this task.

Nothing on the logs or on SEL

Marostegui added a subscriber: Marostegui.

This host went down again:

And [10:23:27]  <+icinga-wm>	PROBLEM - Host cp3053 is DOWN: PING CRITICAL - Packet loss = 100%

As mentioned in last week's SRE meeting, let's upgrade the firmware to the latest revision cpn cp3053?

@RobH As you offered help in the SRE meeting last Monday, can you upgrade the firmware on cp3053?

It appears cp3053 is online at this time. Can I issue a depool command and shut it down for the firmware update at any time or is further scheduling needed? The process will take about 5-15 minutes.

confirmed with @ema that depool via command line and power off is fine, moving on to flashing firmware.

Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:10:51Z] <robh> cp3035 depooling and rebooting for firmware update T239041

Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:11:01Z] <robh> cp3053 depooling and rebooting for firmware update T239041

All ilom and bios updated, irc update to @ema and handing this back to Traffic.

Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:32:09Z] <ema> cp3053: repooling after firmware update T239041

The host has now been up with the new firmware with no issues for one week. Closing for now, we can re-open if needed.