Page MenuHomePhabricator

b8-eqiad pdu refresh (Thursday 10/31 @11am UTC)
Open, HighPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B8-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - connect via serial / confirm serial connection works
  • - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
  • - update PDU model in puppet per T233129.

List of routers, switches, and servers

deviceroleSRE team coordinationrecommended action during maintainance
asw-b8-eqiadasw@ayounsiensure this doesn't go offline as it will take entire rack network offline
ganeti1018ganeti hostserviceopsneeds to be emptied of VMs before
gerrit1001sparefine to do at anytime
cloudvirt1030cloud-services-team
db1132m2 masterDBAThis host is m2 master which holds some internal services, ensure it doesn't go offline, if it does, there is an automatic failover via proxies.
pc1008parsercache hostDBADBA to depool it
restbase1024restbaseserviceops, Servicesfine to do at anytime
an-master1002Analyticsfine to do any time
dbproxy1015db proxyDBANot in use
graphite1004@fgiunchedino action needed, if power is lost and can't be restored quickly we'll switch to codfw
rdb1009redis masterserviceopsthis will need coordination?
notebook1003
db1119db hostDBADBA to depool it
db1113db hostDBADBA to depool it
cloudservices1003cloud-services-team
mwmaint1002This is the primary mw maint system in eqiad, perhaps we should halt deployments during this time?
labpuppetmaster1001cloud-services-team
ores1004ORESserviceopsfine do to at any time
wtp1036parsoidserviceopsfine to do at any time
wtp1035parsoidserviceopsfine to do at any time
wtp1034parsoidserviceopsfine to do at any time
dumpsdata1001dumps data server@ArielGlenncoordinate please
analytics1063Analyticsfine to do any time
analytics1062Analyticsfine to do any time
analytics1061Analyticsfine to do any time

Event Timeline

RobH moved this task from High Priority Task to Blocked on the ops-eqiad board.Jul 26 2019, 1:37 PM
wiki_willy renamed this task from b8-eqiad pdu refresh to b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).Aug 15 2019, 5:39 PM
RobH updated the task description. (Show Details)Aug 28 2019, 6:27 PM
RobH updated the task description. (Show Details)
RobH added subscribers: ayounsi, Nuria, ArielGlenn.
RobH added a subscriber: akosiaris.
RobH removed RobH as the assignee of this task.Aug 28 2019, 6:30 PM
RobH triaged this task as High priority.
RobH updated the task description. (Show Details)
fgiunchedi updated the task description. (Show Details)Sep 2 2019, 9:09 AM
fgiunchedi added a subscriber: fgiunchedi.
elukey updated the task description. (Show Details)Sep 17 2019, 6:04 AM
akosiaris updated the task description. (Show Details)Sep 17 2019, 6:59 AM
ArielGlenn updated the task description. (Show Details)Sep 17 2019, 8:25 AM
Marostegui updated the task description. (Show Details)Wed, Sep 25, 1:48 PM
Marostegui updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Wed, Sep 25, 1:50 PM
RobH updated the task description. (Show Details)Fri, Oct 11, 8:42 PM