This task will track the racking, setup, and OS installation of 16 new PDU sets ordered via T249542 for installation into the racks in rows C and D. These will replace the existing PDUs and require direct coordination and scheduling by the eqiad dc opsen for work in each rack.
== Hostname / Racking / Installation Details ==
Each rack will need to have its actual services listed, and each service then considered for depool/migration of masters/etc to mitigate any issues during the PDU replacement.
Due to the overall complexity of each rack, @robh suggests that either @Jclark-ctr or @Cmjohnson handle population/management/scheduling of each rack as a sub-task off of this rack. @robh will provide a template for use below in the 'per host setup checklist' but please copy that into a sub-task for each rack.
== Per host setup checklist ==
PDU upgrades are very complex. This outline/template of checklist items may require further refinement by the @ops-eqiad folks (as @robh is populating the initial list and may not cover all steps, please review!)
<hostname#1>:
[] - receive in new PDUs on T249542
[] - create a sub-task off of T253694 to list all these steps for each rack
[] - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
[] - add new PDUs into netbox with the name prepend new- for initial netbox entry (once the old PDUs are removed and have their hostnames set to their asset tags, each PDU can be updated to remove the 'new-' prepend off the netbox hostname. example: 'new-ps1-c1-eqiad', 'new-ps2-c1-eqiad'.
The new PDUs will be mounted with one PDU per cabinet side. Also included are new offset PDU brackets. These brackets should be installed so it pushes the PDU further towards the center of the rack (to avoid the horizontal adjustment bar for the vertical rails.) Due to this, it is suggested that the 'link' pdu be installed first, leaving the 'primary' pdu for after (as these are often combined pdu towers being replaced.) This may change at the discretion of the on-sites after review.
[] - list off every server, and its service and service owners on the task for each rack pdu upgrade. This list will need to be reviewed and the PDU work scheduled with the SRE department as a whole. Once the work has been scheduled and cleared, the rest of this checklist can continue.
[] - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
[] - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
[] - install link PDU into the cabinet
[] - de-power old/existing B side power, and plug in new B side link PDU
[] - migrate all B side power connections to new link PDU
[] - Note all B side power connections, input into netbox for every single power port used.
[] - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
[] - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
[] - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
[] - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
[] - Install new PDU tower into the rack, and route power cable for easy cut-over.
[] - de-power old/existing A side power, and plug in new A side link PDU
[] - migrate all A side power connections to new link PDU
[] - Note all A side power connections, input into netbox for every single power port used.
[] - audit all A side connections to ensure all devices are receiving full power on the A side connection.
[] - update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
[] - ensure all errors clear in icinga after work completes