Page MenuHomePhabricator

a8-eqiad pdu refresh (Date TBA)
Open, NormalPublic

Description

This task will track the migration of the ps1-eqiad and ps2-eqiad to be replaced with new PDUs in rack A8-eqiad.

Each server, switch, and router will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

The network racks have two individual PDU towers existing, and will be replaced with two new PDU towers, so this swap is easier than the majority of the row A/B PDU swaps (with their combined old PDU towers.)

  • - schedule downtime for the entire list of switches, routers, and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower

List of routers, switches, and servers

deviceroleSRE team coordinationnotes
cr2-eqiadrouter@ayounsi
asw2-a8-eqiadasw@ayounsi
heliumbackup server@akosiariscan be done at any point in time
helium-arraybackup server@akosiariscan be done at any point in time
bohrium
db1129dbDBA team @Marostegui to depool this host before the maintenance
torrelay1001tor relay
db1117dbDBA teamthis is a passive slave on misc clusters, nothing to be done
labstore1003 (and its 3 arrays)labstorecloud-services-teamThis is decommissioned, can be done anytime

Event Timeline

RobH updated the task description. (Show Details)Jul 2 2019, 7:05 PM
RobH added a subscriber: ayounsi.
akosiaris updated the task description. (Show Details)Jul 23 2019, 7:03 AM
akosiaris added a subscriber: akosiaris.

From the DB side, this rack is good to go

RobH moved this task from Backlog to High Priority Task on the ops-eqiad board.Jul 24 2019, 2:17 PM
RobH moved this task from High Priority Task to Blocked on the ops-eqiad board.Jul 26 2019, 1:37 PM
Marostegui updated the task description. (Show Details)Aug 6 2019, 8:07 AM
RobH removed RobH as the assignee of this task.Aug 14 2019, 4:52 PM
wiki_willy renamed this task from a8-eqiad pdu refresh to a8-eqiad pdu refresh (Thursday 9/19 @11am UTC).Aug 15 2019, 5:32 PM
CDanis triaged this task as Normal priority.Aug 16 2019, 1:02 PM

As this rack has one of our 2 most important routers I'd like to be around for the maintenance.
11am UTC is 4am pacific. It would be ideal if it could be pushed at least to 8am pacific, 15UTC.
Otherwise please make sure Mark or Faidon can be there.

wiki_willy renamed this task from a8-eqiad pdu refresh (Thursday 9/19 @11am UTC) to a8-eqiad pdu refresh (Date TBA).Mon, Sep 16, 5:03 PM
wiki_willy assigned this task to Cmjohnson.

Originally scheduled for Thursday 9/19, but will reschedule for a later date, since this is a network rack.

Bstorm updated the task description. (Show Details)Mon, Sep 16, 5:05 PM