Page MenuHomePhabricator

cp3053 is unreachable
Closed, ResolvedPublic

Description

cp3053 went offline on Saturday 23rd at 18:31 UTC, this seems like another occurrence of T238305.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Nov 25, 2:47 AM

Mentioned in SAL (#wikimedia-operations) [2019-11-25T02:59:24Z] <vgutierrez> depooling & power-cycling cp3053 - T239041

Vgutierrez triaged this task as Medium priority.Mon, Nov 25, 3:03 AM
Vgutierrez updated the task description. (Show Details)
Vgutierrez moved this task from Triage to Hardware on the Traffic board.

Mentioned in SAL (#wikimedia-operations) [2019-11-25T03:13:29Z] <vgutierrez> repooling cp3053 - T239041

Vgutierrez closed this task as Resolved.Mon, Nov 25, 3:14 AM
Vgutierrez claimed this task.

Nothing on the logs or on SEL

Marostegui reopened this task as Open.Mon, Dec 2, 10:25 AM
Marostegui added a subscriber: Marostegui.

This host went down again:

And [10:23:27]  <+icinga-wm>	PROBLEM - Host cp3053 is DOWN: PING CRITICAL - Packet loss = 100%

Mentioned in SAL (#wikimedia-operations) [2019-12-02T13:47:41Z] <ema> power-cycle cp3053 T239041

As mentioned in last week's SRE meeting, let's upgrade the firmware to the latest revision cpn cp3053?

@RobH As you offered help in the SRE meeting last Monday, can you upgrade the firmware on cp3053?

RobH added a comment.Mon, Dec 2, 3:32 PM

It appears cp3053 is online at this time. Can I issue a depool command and shut it down for the firmware update at any time or is further scheduling needed? The process will take about 5-15 minutes.

RobH added a comment.Mon, Dec 2, 4:10 PM

confirmed with @ema that depool via command line and power off is fine, moving on to flashing firmware.

Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:10:51Z] <robh> cp3035 depooling and rebooting for firmware update T239041

Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:11:01Z] <robh> cp3053 depooling and rebooting for firmware update T239041

RobH reassigned this task from Vgutierrez to ema.Mon, Dec 2, 4:31 PM

All ilom and bios updated, irc update to @ema and handing this back to Traffic.

Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:32:09Z] <ema> cp3053: repooling after firmware update T239041

ema closed this task as Resolved.Mon, Dec 9, 2:07 PM

The host has now been up with the new firmware with no issues for one week. Closing for now, we can re-open if needed.