Page MenuHomePhabricator

a5-eqiad pdu refresh
Closed, ResolvedPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack A5-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - setup all remote configuration options for new pdu. (network, snmp, login, etc...)

List of routers, switches, and servers

deviceroleSRE team coordinationnotes
asw2-a5-eqiadasw@ayounsi
ex4300-spare1-eqiadspare@ ayounsi
lvs1009lvsTraffic
lvs1008lvsTraffic
lvs1007lvsTraffic
restbase1020@fgiunchediok with power loss
ores1002ores@akosiarisfine to do at any time
ganeti1008ganeti node@akosiariswill need to be emptied in advance
analytics1071analyticsAnalytics
wtp1030parsoidserviceopsfine to do at any time
wtp1029parsoidserviceopsfine to do at any time
wtp1028parsoidserviceopsfine to do at any time
db1128dbDBAFine to do any time before thursday 25th as at 05:30AM UTC this host will become phabricator master
sessionstore1001sessionstoreserviceops Core Platform Teamshould be fine to do at any time
cp1068cpTraffic
cp1067cpTraffic
cp1066cpTraffic
cp1065cpTraffic
cp1064cpTraffic
cp1063cpTraffic
cp1062cpTraffic
cp1061cpTraffic
cp1059cpTraffic
cp1058cpTraffic
dbproxy1012dbproxyDBA
mw1266mwserviceopsfine to do at any time out of deployment windows
mw1265mwserviceopsfine to do at any time out of deployment windows
mw1264mwserviceopsfine to do at any time out of deployment windows
mw1263mwserviceopsfine to do at any time out of deployment windows
mw1262mwserviceopsfine to do at any time out of deployment windows
mw1261mwserviceopsfine to do at any time out of deployment windows

Event Timeline

RobH updated the task description. (Show Details)Jul 3 2019, 9:45 PM
RobH added subscribers: ayounsi, akosiaris, fgiunchedi.
elukey updated the task description. (Show Details)Jul 16 2019, 10:05 AM
elukey added a subscriber: elukey.

A single Hadoop worker node for analytics, all good.

BBlack added a subscriber: BBlack.Jul 22 2019, 6:14 PM

All the traffic cp and lvs nodes are decoms and not in use: T208584 T208586

RobH updated the task description. (Show Details)Jul 22 2019, 7:02 PM
akosiaris updated the task description. (Show Details)Jul 23 2019, 6:56 AM
akosiaris added a subscriber: MoritzMuehlenhoff.

sudo gnt-node migrate -f ganeti1008

From the DB side of things, this rack should be done before Thursday 30th 05:30AM UTC, as at that time db1128 will become phabricator master T228243: Switchover m3 (phabricator) master db1072 to db1128

Marostegui updated the task description. (Show Details)Jul 23 2019, 7:04 AM
Joe updated the task description. (Show Details)Jul 23 2019, 7:06 AM
Marostegui updated the task description. (Show Details)Jul 23 2019, 7:14 AM

From the DB side of things, this rack should be done before Thursday 30th 05:30AM UTC, as at that time db1128 will become phabricator master T228243: Switchover m3 (phabricator) master db1072 to db1128

Correction: Thursday 25th

fgiunchedi updated the task description. (Show Details)Jul 23 2019, 9:28 AM

Mentioned in SAL (#wikimedia-operations) [2019-07-23T15:46:41Z] <robh> side b of a5-eqiad swapping pdu via T227141

RobH added a comment.Jul 23 2019, 4:20 PM

Both sides are swapped, and all items appear online.

RobH closed this task as Resolved.Jul 23 2019, 7:11 PM
RobH triaged this task as Normal priority.
RobH updated the task description. (Show Details)
RobH removed RobH as the assignee of this task.Aug 28 2019, 6:40 PM