Page MenuHomePhabricator

b4-eqiad pdu refresh (Thursday 10/24 @11am UTC)
Open, NormalPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B4-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - connect via serial / confirm serial connection works
  • - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
  • - update PDU model in puppet per T233129.

List of routers, switches, and servers

deviceroleSRE team coordinationnotes
atlas-eqiadRIPE atlas anchorunknownthis has a single infeed power and will lose connection. is it best to disconnect before work and reconnect afterwards, rather than have it pop up and down multiple times during the work?
asw2-b4-eqiadasw@ayounsiensure asw doesn't lose all power or the entire rack goes offline from network
rutheniumparsoid::testing
elastic1050elastic system@Gehel
prometheus1004prometheus@fgiunchedi
cloudvirt1007cloudvirt hostcloud-services-team@JHedden 21 active VMs, please handle with care
cloudvirt1006cloudvirt hostcloud-services-team@JHedden 17 active VMs, please handle with care
cloudvirt1005cloudvirt hostcloud-services-team@JHedden 27 active VMs, please handle with care
cloudvirt1019cloudvirt hostcloud-services-team@JHedden 2 active VMs, please handle with care
cloudvirt1004cloudvirt hostcloud-services-team@JHedden 19 active VMs, please handle with care
cloudvirt1003cloudvirt hostcloud-services-team@JHedden 17 active VMs, please handle with care
cloudvirt1021cloudvirt hostcloud-services-team@JHedden 25 active VMs, please handle with care
cloudvirt1016cloudvirt hostcloud-services-team@JHedden 58 active VMs, please handle with care
cloudnet1004cloudvirt hostcloud-services-team@JHedden can happen anytime, has redundant peer
cloudvirt1013cloudvirt hostcloud-services-team@JHedden 23 active VMs, please handle with care
conf1005zookeeper/etc discovery service
phab1001phabricator main system
iron
kubestage1002
kafka1002@herron
an-worker1085hadoopAnalyticsfine to do any time
maps1002openstreetmaps slave server

Event Timeline

RobH created this task.Jul 8 2019, 10:46 PM
RobH moved this task from High Priority Task to Blocked on the ops-eqiad board.Jul 26 2019, 1:37 PM
wiki_willy renamed this task from b4-eqiad pdu refresh to b4-eqiad pdu refresh (Thursday 10/24 @11am UTC).Aug 15 2019, 5:36 PM
RobH updated the task description. (Show Details)Aug 29 2019, 4:24 PM
RobH added subscribers: ayounsi, Gehel, Nuria.
RobH updated the task description. (Show Details)Aug 29 2019, 6:08 PM
colewhite updated the task description. (Show Details)Aug 30 2019, 8:58 PM
colewhite added a subscriber: fgiunchedi.
RobH removed RobH as the assignee of this task.Sep 6 2019, 3:35 PM
jbond triaged this task as Normal priority.Sep 9 2019, 9:15 AM
elukey updated the task description. (Show Details)Sep 17 2019, 6:02 AM
elukey added a subscriber: herron.
RobH updated the task description. (Show Details)Fri, Oct 11, 8:41 PM
JHedden updated the task description. (Show Details)Mon, Oct 21, 7:09 PM
JHedden added a subscriber: JHedden.