This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B8-eqiad.
Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.
These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.
- - schedule downtime for the entire list of switches and servers.
- - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- - Once new PDU tower is confirmed online, move on to next steps.
- - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- - connect via serial / confirm serial connection works
- - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
- - update PDU model in puppet per T233129.
List of routers, switches, and servers
device | role | SRE team coordination | recommended action during maintainance | |
asw-b8-eqiad | asw | @ayounsi | ensure this doesn't go offline as it will take entire rack network offline | |
ganeti1018 | ganeti host | serviceops | needs to be emptied of VMs before | |
gerrit1001 | spare | fine to do at anytime | ||
cloudvirt1030 | hypervisor | cloud-services-team | Lots of VMs, please handle with care. | |
db1132 | m2 master | DBA | This host is m2 master which holds some internal services, ensure it doesn't go offline, if it does, there is an automatic failover via proxies. | |
pc1008 | parsercache host | DBA | DBA to depool it | |
restbase1024 | restbase | serviceops, Services | fine to do at anytime | |
an-master1002 | Analytics | fine to do any time | ||
dbproxy1015 | db proxy | DBA | Not in use | |
graphite1004 | @fgiunchedi | no action needed, if power is lost and can't be restored quickly we'll switch to codfw | ||
rdb1009 | redis master | serviceops | this will need coordination? | |
notebook1003 | ||||
db1119 | db host | DBA | DBA to depool it | |
db1113 | db host | DBA | DBA to depool it | |
cloudservices1003 | DNS | cloud-services-team | fine to do at anytime | |
mwmaint1002 | This is the primary mw maint system in eqiad, perhaps we should halt deployments during this time? | |||
labpuppetmaster1001 | spare | cloud-services-team | Good to go. Host is being decommissioned. | |
ores1004 | ORES | serviceops | fine do to at any time | |
wtp1036 | parsoid | serviceops | fine to do at any time | |
wtp1035 | parsoid | serviceops | fine to do at any time | |
wtp1034 | parsoid | serviceops | fine to do at any time | |
dumpsdata1001 | dumps data server | @ArielGlenn | coordinate please | |
analytics1063 | Analytics | fine to do any time | ||
analytics1062 | Analytics | fine to do any time | ||
analytics1061 | Analytics | fine to do any time | ||