This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B4-eqiad.
Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.
These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.
- - schedule downtime for the entire list of switches and servers.
- - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- - Once new PDU tower is confirmed online, move on to next steps.
- - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- - connect via serial / confirm serial connection works
- - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
- - update PDU model in puppet per T233129.
List of routers, switches, and servers
device | role | SRE team coordination | notes |
atlas-eqiad | RIPE atlas anchor | unknown | this has a single infeed power and will lose connection. is it best to disconnect before work and reconnect afterwards, rather than have it pop up and down multiple times during the work? |
asw2-b4-eqiad | asw | @ayounsi | ensure asw doesn't lose all power or the entire rack goes offline from network |
ruthenium | parsoid::testing | ||
elastic1050 | elastic system | @Gehel | |
prometheus1004 | prometheus | @fgiunchedi | |
cloudvirt1007 | cloudvirt host | cloud-services-team | @JHedden 21 active VMs, please handle with care |
cloudvirt1006 | cloudvirt host | cloud-services-team | @JHedden 17 active VMs, please handle with care |
cloudvirt1005 | cloudvirt host | cloud-services-team | @JHedden 27 active VMs, please handle with care |
cloudvirt1019 | cloudvirt host | cloud-services-team | @JHedden 2 active VMs, please handle with care |
cloudvirt1004 | cloudvirt host | cloud-services-team | @JHedden 19 active VMs, please handle with care |
cloudvirt1003 | cloudvirt host | cloud-services-team | @JHedden 17 active VMs, please handle with care |
cloudvirt1021 | cloudvirt host | cloud-services-team | @JHedden 25 active VMs, please handle with care |
cloudvirt1016 | cloudvirt host | cloud-services-team | @JHedden 58 active VMs, please handle with care |
cloudnet1004 | cloudvirt host | cloud-services-team | @JHedden can happen anytime, has redundant peer |
cloudvirt1013 | cloudvirt host | cloud-services-team | @JHedden 23 active VMs, please handle with care |
conf1005 | zookeeper/etc discovery service | ||
phab1001 | phabricator main system | ||
iron | |||
kubestage1002 | |||
kafka1002 | @herron | ||
an-worker1085 | hadoop | Analytics | fine to do any time |
maps1002 | openstreetmaps slave server | ||