This task will track the racking, setup, and OS installation of 16 new PDU sets ordered via T249542 for installation into the racks in rows C and D. These will replace the existing PDUs and require direct coordination and scheduling by the eqiad dc opsen for work in each rack.
Hostname / Racking / Installation Details
Each rack will need to have its actual services listed, and each service then considered for depool/migration of masters/etc to mitigate any issues during the PDU replacement.
Due to the overall complexity of each rack, robh suggests that either @Jclark-ctr or @Cmjohnson handle population/management/scheduling of each rack as a sub-task off of this rack. Template for use below in the 'per host setup checklist' but please copy that into a sub-task for each rack.
Schedule
updated schedule below:
- Sept 8, Tuesday 12pm-4pm UTC - racks D3 and D4 (https://phabricator.wikimedia.org/T261452)
- Sept 14, Monday 12pm-4pm UTC - racks D5 and D6 (https://phabricator.wikimedia.org/T261453)
- Sept 15, Tuesday 12pm-4pm UTC - racks C4, C5, C2, and C3 (https://phabricator.wikimedia.org/T261456 & https://phabricator.wikimedia.org/T261455)
- Sept 16, Wednesday 12pm-4pm UTC - racks C6, C7, D7, and D8 (https://phabricator.wikimedia.org/T261457 & https://phabricator.wikimedia.org/T261454)
- Sept 17, Thursday 12pm-4pm UTC - rack C1 (Fundraising), D1, and D2 (https://phabricator.wikimedia.org/T261458 & https://phabricator.wikimedia.org/T261459)
Per host setup checklist
PDU upgrades are very complex. This outline/template of checklist items may require further refinement by the @ops-eqiad folks (as robh is populating the initial list and may not cover all steps, please review!)
<hostname#1>:
- - receive in new PDUs on T249542
- - create a sub-task off of T253694 to list all these steps for each rack
- - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
- - add new PDUs into netbox with the name prepend new- for initial netbox entry (once the old PDUs are removed and have their hostnames set to their asset tags, each PDU can be updated to remove the 'new-' prepend off the netbox hostname. example: 'new-ps1-c1-eqiad', 'new-ps2-c1-eqiad'.
The new PDUs will be mounted with one PDU per cabinet side. Also included are new offset PDU brackets. These brackets should be installed so it pushes the PDU further towards the center of the rack (to avoid the horizontal adjustment bar for the vertical rails.) Due to this, it is suggested that the 'link' pdu be installed first, leaving the 'primary' pdu for after (as these are often combined pdu towers being replaced.) This may change at the discretion of the on-sites after review.
- - list off every server, and its service and service owners on the task for each rack pdu upgrade. This list will need to be reviewed and the PDU work scheduled with the SRE department as a whole. Once the work has been scheduled and cleared, the rest of this checklist can continue.
- - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
- - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
- - install link PDU into the cabinet
- - de-power old/existing B side power, and plug in new B side link PDU
- - migrate all B side power connections to new link PDU
- - Note all B side power connections, input into netbox for every single power port used.
- - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
- - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
- - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
- - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
- - Install new PDU tower into the rack, and route power cable for easy cut-over.
- - de-power old/existing A side power, and plug in new A side link PDU
- - migrate all A side power connections to new link PDU
- - note mgmt ip for old pdu in netbox, remove old pdu from rack in netbox (using its asset tag name), setting to offline and removing its mgmt dns/ip info in netbox.
- - run the mgmt dns script in netbox for the new pdu, providing the old PDUs mgmt ip in the script entry.
- - Note all A side power connections, input into netbox for every single power port used.
- - audit all A side connections to ensure all devices are receiving full power on the A side connection.
- - connect serial to new PDU, ensure serial connection is functional
- - setup network configuration of new PDU via serial
- - setup remaining pdu configuration via https interface
- - update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
- - update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
- - Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry (rob will change this to more detailed entry later)
- - ensure all errors clear in icinga and netbox after work completes