Page MenuHomePhabricator

Inspect and diagnose labstore1001's H800 controler
Closed, ResolvedPublic


To be done when labstore1002 is the active server, as part of the switchover test.

Things that may be worthwhile to inspect in detail:

  • Firmware revision of controller and BIOS
  • Possible divergence in firmware-level configuration
  • Stability of wiring

A good load test is also likely to be necessary.

The original problem has not yet reoccurred since the last cold start, but we should do everything we can to be confident in the hardware before labstore1001 is switch back as primary.

Event Timeline

coren raised the priority of this task from to Needs Triage.
coren updated the task description. (Show Details)
coren subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Andrew triaged this task as High priority.Apr 11 2015, 9:28 PM
Andrew set Security to None.

@yuvipanda: No, the switchover test never took place and other concerns overrode this, and now labstore1001 is disconnected from the shelves.

At this point, I don't think we have any reason to believe the labstore1001 hardware is flaky. Hardware with known issue are: the labstore1002 H800 controler is known to randomly no pass POST, and one of the labstore2001 shelves is not working.

coren claimed this task.

Resolved by the switchover test to end all switchover tests: labstore1001 is now back to being the primary server.