cp3053 went offline on Saturday 23rd at 18:31 UTC, this seems like another occurrence of T238305.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T238305 Servers freezing across the caching cluster | |||
Resolved | • ema | T239041 cp3053 is unreachable |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2019-11-25T02:59:24Z] <vgutierrez> depooling & power-cycling cp3053 - T239041
Mentioned in SAL (#wikimedia-operations) [2019-11-25T03:13:29Z] <vgutierrez> repooling cp3053 - T239041
This host went down again:
And [10:23:27] <+icinga-wm> PROBLEM - Host cp3053 is DOWN: PING CRITICAL - Packet loss = 100%
Mentioned in SAL (#wikimedia-operations) [2019-12-02T13:47:41Z] <ema> power-cycle cp3053 T239041
As mentioned in last week's SRE meeting, let's upgrade the firmware to the latest revision cpn cp3053?
@RobH As you offered help in the SRE meeting last Monday, can you upgrade the firmware on cp3053?
It appears cp3053 is online at this time. Can I issue a depool command and shut it down for the firmware update at any time or is further scheduling needed? The process will take about 5-15 minutes.
confirmed with @ema that depool via command line and power off is fine, moving on to flashing firmware.
Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:10:51Z] <robh> cp3035 depooling and rebooting for firmware update T239041
Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:11:01Z] <robh> cp3053 depooling and rebooting for firmware update T239041
Mentioned in SAL (#wikimedia-operations) [2019-12-02T16:32:09Z] <ema> cp3053: repooling after firmware update T239041
The host has now been up with the new firmware with no issues for one week. Closing for now, we can re-open if needed.