This task will track the swapping of the PDU tower(s) in rack A3-eqiad.
The current PDU tower is malfunctioning, with a short having caused issues on both side B (wholly offline) and parts of side a (one circuit group of outlets has defunct outlets.)
Chris has onsite spares (2 dual-wide PDU towers with 48 ports per side) to test out and use for replacement in this rack.
== Maintenance Window Scheduling ==
Primary Date: Thursday, 2019-01-17 @ 07:00 EST (12:00 GMT)
Backup Date: Tuesday, 2019-01-22 @ 07:00 EST (12:00 GMT)
Estimated Duration: Up to 2 hours
=== Maintenance Window Checklist ===
The following steps must be met for this swap:
[] - all servers will need to be taken offline and powered down for the duration of the migration
[] - old pdu must be removed from the rack, new pdu installed, all power migrated over to it
The side B of A3-eqiad may also have had the circuit breaker tripped during the failure, and may require Equinix technicians to flip the breaker in the EQ circuit breaker box.
== Servers & Devices in A3-eqiad ==
The following items are in a3-eqiad: https://netbox.wikimedia.org/dcim/racks/3/
Servers (grouped by service owner when possible):
Analytics:
analytics1052
analytics1053
analytics1054
analytics1055
analytics1056
analytics1057
analytics1059
analytics1060
cloud:
cloudservices1004
traffic:
cp1008
dba:
db1103
db1127
dbproxy1001
dbproxy1002
dbproxy1003
dbstore1003
pc1004
discovery:
elastic1030
elastic1031
elastic1032
elastic1033
elastic1034
elastic1035
misc:
ganeti1007
graphite1003
kubernetes1001
prometheus1003
radium - tor relay
rdb1005
relforge1001
services:
@robh synced with @Eevans about these. restbase 1016 is already offline. the other restbase systems can be logged into via SSH and cleanly shutdown just before the maintenance, and then powered back up normally post window.
restbase1010
restbase1011
restbase1016