This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B7-eqiad.
Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.
These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.
- - schedule downtime for the entire list of switches and servers.
- - before work starts, silence all icinga alerts until 8PM GMT same day
- - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- - Once new PDU tower is confirmed online, move on to next steps.
- - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- - connect via serial / confirm serial connection works
- - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
- - update PDU model in puppet per T233129.
- - clear icinga errors for missing ps2 input by connecting/checking connection of the rj11 cable connection between ps1 and ps2 b7-eqiad. Once it is connected, the icinga errors for the tower B infeed will clear up.
List of routers, switches, and servers
| device | role | SRE team coordination | recommended action during maintainance |
| asw-b7-eqiad | asw | @ayounsi | ensure this doesn't go offline as it will take entire rack network offline |
| wtp1033 | |||
| wtp1032 | |||
| wtp1031 | |||
| kafka-main1002 | @herron | To avoid alert noise from adjacent kafka-main hosts, schedule icinga downtime for "Kafka Broker Under Replicated Partitions" service on kafka-main100[123] as well. Perform graceful shutdown of server before maintenance, and ensure powered up when completed. | |
| dbprov1002 | db provisioning/backup generation host | DBA | Really nothing to do, but @jcrespo will keep an eye on it |
| cloudvirtan1005 | |||
| cloudvirtan1004 | |||
| an-worker1087 | @Nuria | ||
| an-worker1086 | @Nuria | ||
| cp1082 | cp system | Traffic | T227542#5355289 |
| cp1081 | cp system | Traffic | T227542#5355289 |
| ms-be1041 | ms-be system | fillipo | gracefully shutdown the host just before rack maintainance, and power it back online post-maintainance. |
| cloudvirt1022 | cloudvirt host | cloud-services-team | @JHedden No running VMs, can happen anytime |
| analytics1073 | Analytics | fine to do any time | |
| lvs1014 | lvs system | @BBlack | T227542#5355289 |
| cloudvirt1020 | cloudvirt host | cloud-services-team | @JHedden has running VMs please handle with care |
| druid1005 | Analytics | fine to do any time | |
| ores1003 | |||
| cloudnet1003 | cloud-services-team | @JHedden is active but it has a redundant peer | |
| restbase-dev1005 | |||
| cloudcontrol1004 | cloud-services-team | @JHedden is active but it has a redundant peer | |
| cloudvirt1017 | cloudvirt | cloud-services-team | @JHedden has a large number of running VMs, please handle with care |
| mw1318 | mw server | @Joe | |
| mw1317 | mw server | @Joe | |
| mw1316 | mw server | @Joe | |
| mw1315 | mw server | @Joe | |
| mw1314 | mw server | @Joe | |
| mw1313 | mw server | @Joe |