Page MenuHomePhabricator

mr1-ulsfo crashed
Closed, ResolvedPublic

Description

mr1-ulsfo didn't like "request system storage cleanup" which seems to have made crash.
Doesn't reply to pings (on both normal and oob interface), and console server is only reachable through mr1-ulsfo.
The box needs to be power cycled.

Event Timeline

ayounsi created this task.May 10 2017, 6:14 PM
Restricted Application added a project: Operations. · View Herald TranscriptMay 10 2017, 6:14 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
RobH added a comment.May 10 2017, 6:16 PM

I emailed support to reboot it via power cable removal:

Support,

In remotely administering our mr1-ulsfo Juniper SRX100 device, it locked up and is unresponsive to our attempts to connect to it. We would like it powercycled via power cable removal (pull the power cable to reboot it).

It is not a full rack mount device, but is mounted on a shelf in the rack. It has a single power feed via a power adapter, also on the shelf.

Please simply remove power, wait a few moments, and then plug it back in.

Thanks!

RobH reassigned this task from RobH to ayounsi.May 10 2017, 8:10 PM
RobH added a subscriber: RobH.

So united layer support rebooted this for us, and now @ayounsi is working on recovery.

ayounsi added a subscriber: faidon.May 10 2017, 8:53 PM

Its internal storage is corrupted, @faidon re-did the steps listed on https://phabricator.wikimedia.org/T127295
And I restored the last working configuration based on rancid and jnt.
Ran "request system configuration rescue save" too, just in case.

Opened case 2017-0510-0720 with Juniper to hopefully get a replacement unit.

RMA# R200124729

RobH added a comment.May 11 2017, 10:35 PM

Juniper emailed us the tracking info, and I've opened an inbound shipment ticket with unitedlayer.

I'll plan to go onsite next Wednesday and swap them.

Mentioned in SAL (#wikimedia-operations) [2017-05-18T14:32:36Z] <XioNoX> rebooting mr1-ulsfo for software upgrade - T164970

ayounsi closed this task as Resolved.May 18 2017, 2:46 PM