Page MenuHomePhabricator

a5-eqiad pdu refresh
Closed, ResolvedPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack A5-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - setup all remote configuration options for new pdu. (network, snmp, login, etc...)

List of routers, switches, and servers

deviceroleSRE team coordinationnotes
asw2-a5-eqiadasw@ayounsi
ex4300-spare1-eqiadspare@ ayounsi
lvs1009lvsTraffic
lvs1008lvsTraffic
lvs1007lvsTraffic
restbase1020@fgiunchediok with power loss
ores1002ores@akosiarisfine to do at any time
ganeti1008ganeti node@akosiariswill need to be emptied in advance
analytics1071analyticsAnalytics
wtp1030parsoidserviceopsfine to do at any time
wtp1029parsoidserviceopsfine to do at any time
wtp1028parsoidserviceopsfine to do at any time
db1128dbDBAFine to do any time before thursday 25th as at 05:30AM UTC this host will become phabricator master
sessionstore1001sessionstoreserviceops Platform Engineeringshould be fine to do at any time
cp1068cpTraffic
cp1067cpTraffic
cp1066cpTraffic
cp1065cpTraffic
cp1064cpTraffic
cp1063cpTraffic
cp1062cpTraffic
cp1061cpTraffic
cp1059cpTraffic
cp1058cpTraffic
dbproxy1012dbproxyDBA
mw1266mwserviceopsfine to do at any time out of deployment windows
mw1265mwserviceopsfine to do at any time out of deployment windows
mw1264mwserviceopsfine to do at any time out of deployment windows
mw1263mwserviceopsfine to do at any time out of deployment windows
mw1262mwserviceopsfine to do at any time out of deployment windows
mw1261mwserviceopsfine to do at any time out of deployment windows

Event Timeline

elukey subscribed.

A single Hadoop worker node for analytics, all good.

All the traffic cp and lvs nodes are decoms and not in use: T208584 T208586

akosiaris added a subscriber: MoritzMuehlenhoff.

sudo gnt-node migrate -f ganeti1008

From the DB side of things, this rack should be done before Thursday 30th 05:30AM UTC, as at that time db1128 will become phabricator master T228243: Switchover m3 (phabricator) master db1072 to db1128

From the DB side of things, this rack should be done before Thursday 30th 05:30AM UTC, as at that time db1128 will become phabricator master T228243: Switchover m3 (phabricator) master db1072 to db1128

Correction: Thursday 25th

Mentioned in SAL (#wikimedia-operations) [2019-07-23T15:46:41Z] <robh> side b of a5-eqiad swapping pdu via T227141

Both sides are swapped, and all items appear online.

RobH triaged this task as Medium priority.
RobH updated the task description. (Show Details)
RobH removed RobH as the assignee of this task.Aug 28 2019, 6:40 PM