This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack [[ https://netbox.wikimedia.org/dcim/racks/15/ | B7-eqiad ]].
Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.
These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.
[] - schedule downtime for the entire list of switches and servers.
[] - before work starts, silence all icinga alerts until 8PM GMT same day
[] - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
[] - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
[] - Once new PDU tower is confirmed online, move on to next steps.
[] - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
[] - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
[] - connect via serial / confirm serial connection works
[] - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
[] - update PDU model in puppet per T233129.
== List of routers, switches, and servers ==
| device | role | SRE team coordination | recommended action during maintainance
| asw-b7-eqiad | asw | @ayounsi | ensure this doesn't go offline as it will take entire rack network offline
| wtp1033 | | | |
| wtp1032 | | | |
| wtp1031 | | | |
| kafka-main1002 | | @herron | To avoid alert noise from adjacent kafka-main hosts, schedule icinga downtime for "Kafka Broker Under Replicated Partitions" service on kafka-main100[123] as well. Perform graceful shutdown of server before maintenance, and ensure powered up when completed. |
| dbprov1002 | db provisioning/backup generation host | #dba | Really nothing to do, but @jcrespo will keep an eye on it |
| cloudvirtan1005 | | | |
| cloudvirtan1004 | | | |
| an-worker1087 | | @nuria | |
| an-worker1086 | | @nuria | |
| cp1082 | cp system | #traffic | T227542#5355289 |
| cp1081 | cp system | #traffic | T227542#5355289 |
| ms-be1041 | ms-be system | fillipo | gracefully shutdown the host just before rack maintainance, and power it back online post-maintainance.
| cloudvirt1022 | cloudvirt host | #cloud-services-team |@JHedden No running VMs, can happen anytime|
| analytics1073 | | #analytics| fine to do any time
| lvs1014 | lvs system | @bblack | T227542#5355289 |
| cloudvirt1020 | cloudvirt host | #cloud-services-team |@JHedden has running VMs please handle with care |
| druid1005 | | #analytics| fine to do any time
| ores1003 | | | |
| cloudnet1003 | | #cloud-services-team |@JHedden is active but it has a redundant peer |
| restbase-dev1005 | | | |
| cloudcontrol1004 | | #cloud-services-team |@JHedden is active but it has a redundant peer |
| cloudvirt1017 | cloudvirt | #cloud-services-team |@JHedden has a large number of running VMs, please handle with care|
| mw1318 | mw server | @joe | |
| mw1317 | mw server | @joe | |
| mw1316 | mw server | @joe | |
| mw1315 | mw server | @joe | |
| mw1314 | mw server | @joe | |
| mw1313 | mw server | @joe | |