Page MenuHomePhabricator

Tue, Sept 8 PDU Upgrade 12pm-4pm UTC- Racks D3 and D4
Closed, ResolvedPublicRequest

Description

<ps1-d3-eqiad & ps2-d3-eqiad>:

  • - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
  • - netbox updated
  • - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
  • - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
  • - install link PDU into the cabinet
  • - de-power old/existing B side power, and plug in new B side link PDU
  • - migrate all B side power connections to new link PDU
  • - Note all B side power connections, input into netbox for every single power port used.
  • - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
  • - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
  • - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
  • - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
  • - Install new PDU tower into the rack, and route power cable for easy cut-over.
  • - de-power old/existing A side power, and plug in new A side link PDU
  • - migrate all A side power connections to new link PDU
  • - Note all A side power connections, input into netbox for every single power port used.
  • - audit all A side connections to ensure all devices are receiving full power on the A side connection.
  • - connect serial to new PDU, ensure serial connection is functional
  • - (Rob) setup network configuration of new PDU via serial
  • - (Rob) setup remaining pdu configuration via https interface
  • - (Rob) update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
  • - (Rob) Update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
  • - (Rob) Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry.
  • - ensure all errors clear in icinga and netbox after work completes

<ps1-d4-eqiad & ps2-d4-eqiad>:

  • - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
  • - netbox updated
  • - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
  • - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
  • - install link PDU into the cabinet
  • - de-power old/existing B side power, and plug in new B side link PDU
  • - migrate all B side power connections to new link PDU
  • - Note all B side power connections, input into netbox for every single power port used.
  • - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
  • - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
  • - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
  • - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
  • - Install new PDU tower into the rack, and route power cable for easy cut-over.
  • - de-power old/existing A side power, and plug in new A side link PDU
  • - migrate all A side power connections to new link PDU
  • - Note all A side power connections, input into netbox for every single power port used.
  • - audit all A side connections to ensure all devices are receiving full power on the A side connection.
  • - connect serial to new PDU, ensure serial connection is functional
  • - (Rob) setup network configuration of new PDU via serial
  • - (Rob) setup remaining pdu configuration via https interface
  • - (Rob) update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
  • - (Rob) Update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
  • - (Rob) Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry.
  • - ensure all errors clear in icinga and netbox after work completes

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptAug 27 2020, 8:22 PM
wiki_willy updated the task description. (Show Details)Aug 27 2020, 8:22 PM

List of hosts (and racks) from this maintenance window:

an-test-coord1001 D3
aqs1009 D3
asw2-d3-eqiad D3
auth1002 D3
cablemgmt-wmf5286 D3
db1106 D3
db1140 D3
dbproxy1017 D3
elastic1062 D3
elastic1063 D3
eventlog1002 D3
ganeti1019 D3
kubernetes1004 D3
kubernetes1013 D3
maps1004 D3
msw-d3-eqiad D3
mw1363 D3
mw1364 D3
mw1365 D3
ores1007 D3
pc1010 D3
ps1-d3-eqiad D3
ps2-d3-eqiad D3
rdb1006 D3
restbase1018 D3
restbase1025 D3
scb1004 D3
sessionstore1003 D3
sretest1001 D3
stat1006 D3
thorium D3
wdqs1005 D3
wmf4579 D3
wtp1043 D3
wtp1044 D3
wtp1045 D3
analytics1038 D4
analytics1039 D4
analytics1040 D4
analytics1041 D4
aqs1006 D4
asw2-d4-eqiad D4
asw3-d4-eqiad D4
cablemgmt-wmf5287 D4
conf1006 D4
db1114 D4
druid1003 D4
elastic1064 D4
labweb1002 D4
mc1033 D4
mc1034 D4
mc1035 D4
mc1036 D4
msw-d4-eqiad D4
ores1008 D4
ps1-d4-eqiad D4
ps2-d4-eqiad D4
puppetmaster1002 D4
restbase1030 D4
snapshot1007 D4
wtp1046 D4
wtp1047 D4
wtp1048 D4

wiki_willy renamed this task from Tue, Sept 8 PDU Upgrade - Racks D3 and D4 to Tue, Sept 8 PDU Upgrade 12pm-4pm UTC- Racks D3 and D4.Aug 27 2020, 8:24 PM
Marostegui moved this task from Triage to Next on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2020-09-08T06:14:43Z] <marostegui> Stop MySQL on db1106 for PDU maintenance T261452

starting maintenance do not expect any outages will be disconnecting pdu`s in about 1 hour

Jclark-ctr updated the task description. (Show Details)Tue, Sep 8, 3:33 PM
Jclark-ctr updated the task description. (Show Details)
Marostegui moved this task from Next to Done on the DBA board.Wed, Sep 9, 11:01 AM
Marostegui added a subscriber: Marostegui.

Is there anything pending here that might require power changes?

RobH updated the task description. (Show Details)Wed, Sep 9, 5:49 PM

Change 626744 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] updating ps1-d[34]-eqiad

https://gerrit.wikimedia.org/r/626744

Change 626744 merged by RobH:
[operations/puppet@production] updating ps1-d[34]-eqiad

https://gerrit.wikimedia.org/r/626744

RobH updated the task description. (Show Details)Fri, Sep 11, 8:40 PM
RobH reassigned this task from Jclark-ctr to Cmjohnson.Mon, Sep 14, 4:13 PM
RobH added subscribers: Jclark-ctr, RobH.

It appears all the steps by onsites were done, but its unclear. If there are any pending steps for these, please do so and then resolve this task!

Cmjohnson closed this task as Resolved.Thu, Sep 17, 4:09 PM
Cmjohnson updated the task description. (Show Details)