Page MenuHomePhabricator

b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC)
Open, HighPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B7-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - connect via serial / confirm serial connection works
  • - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
  • - update PDU model in puppet per T233129.

List of routers, switches, and servers

deviceroleSRE team coordinationrecommended action during maintainance
asw-b7-eqiadasw@ayounsiensure this doesn't go offline as it will take entire rack network offline
wtp1033
wtp1032
wtp1031
kafka-main1002@herron
dbprov1002pb provisioning hostDBA
cloudvirtan1005
cloudvirtan1004
an-worker1087@Nuria
an-worker1086@Nuria
cp1082cp systemTrafficT227542#5355289
cp1081cp systemTrafficT227542#5355289
ms-be1041ms-be systemfillipogracefully shutdown the host just before rack maintainance, and power it back online post-maintainance.
cloudvirt1022cloudvirt hostcloud-services-team
analytics1073Analyticsfine to do any time
lvs1014lvs system@BBlackT227542#5355289
cloudvirt1020cloudvirt hostcloud-services-team
druid1005Analyticsfine to do any time
ores1003
cloudnet1003cloud-services-team
restbase-dev1005
cloudcontrol1004cloud-services-team
cloudvirt1017cloudvirtcloud-services-team
mw1318mw server@Joe
mw1317mw server@Joe
mw1316mw server@Joe
mw1315mw server@Joe
mw1314mw server@Joe
mw1313mw server@Joe

Details

Due Date
Fri, Nov 15, 11:00 AM

Event Timeline

RobH created this task.Jul 8 2019, 10:48 PM
BBlack added a subscriber: BBlack.Jul 22 2019, 8:12 PM

lvs1014 here will need special care, Traffic should stop puppet and pybal and monitor failover to lvs1016 ahead of work, then revert afterwards. cp1081 and cp1082 here can be depooled as normal.

RobH moved this task from High Priority Task to Blocked on the ops-eqiad board.Jul 26 2019, 1:37 PM
wiki_willy renamed this task from b7-eqiad pdu refresh to b7-eqiad pdu refresh (Tuesday 11/5 @11am UTC).Aug 15 2019, 5:38 PM
RobH triaged this task as High priority.Aug 28 2019, 6:31 PM
RobH updated the task description. (Show Details)
RobH removed RobH as the assignee of this task.Aug 28 2019, 6:39 PM
RobH updated the task description. (Show Details)
RobH set Due Date to Fri, Nov 15, 12:00 AM.
RobH changed Due Date from Fri, Nov 15, 12:00 AM to Fri, Nov 15, 11:00 AM.
RobH added subscribers: ayounsi, Nuria, Joe.
elukey updated the task description. (Show Details)Sep 17 2019, 6:03 AM
elukey added a subscriber: herron.
RobH updated the task description. (Show Details)Fri, Oct 11, 8:42 PM