Page MenuHomePhabricator

Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks C6 and C7
Closed, ResolvedPublicRequest

Description

ps1-c6-eqiad & ps2-c6-eqiad:

  • - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
  • - netbox updated
  • - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
  • - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
  • - install link PDU into the cabinet
  • - de-power old/existing B side power, and plug in new B side link PDU
  • - migrate all B side power connections to new link PDU
  • - Note all B side power connections, input into netbox for every single power port used.
  • - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
  • - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
  • - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
  • - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
  • - Install new PDU tower into the rack, and route power cable for easy cut-over.
  • - de-power old/existing A side power, and plug in new A side link PDU
  • - migrate all A side power connections to new link PDU
  • - Note all A side power connections, input into netbox for every single power port used.
  • - audit all A side connections to ensure all devices are receiving full power on the A side connection.
  • - connect serial to new PDU, ensure serial connection is functional
  • - (Rob) setup network configuration of new PDU via serial
  • - (Rob) setup remaining pdu configuration via https interface
  • - (Rob) update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
  • - (Rob) Update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
  • - (Rob) Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry.
  • - ensure all errors clear in icinga and netbox after work completes

<ps1-c7-eqiad & ps2-c7-eqiad>:

  • - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
  • - netbox updated
  • - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
  • - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
  • - install link PDU into the cabinet
  • - de-power old/existing B side power, and plug in new B side link PDU
  • - migrate all B side power connections to new link PDU
  • - Note all B side power connections, input into netbox for every single power port used.
  • - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
  • - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
  • - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
  • - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
  • - Install new PDU tower into the rack, and route power cable for easy cut-over.
  • - de-power old/existing A side power, and plug in new A side link PDU
  • - migrate all A side power connections to new link PDU
  • - Note all A side power connections, input into netbox for every single power port used.
  • - audit all A side connections to ensure all devices are receiving full power on the A side connection.
  • - connect serial to new PDU, ensure serial connection is functional
  • - (Rob) setup network configuration of new PDU via serial
  • - (Rob) setup remaining pdu configuration via https interface
  • - (Rob) update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
  • - (Rob) Update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
  • - (Rob) Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry.
  • - ensure all errors clear in icinga and netbox after work completes

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptThu, Aug 27, 8:39 PM

List of hostnames in racks C6 and C7 below:

alert1001 C6
asw2-c6-eqiad C6
asw-c6-eqiad C6
bast1002 C6
cablemgmt-wmf5283 C6
db1121 C6
db1134 C6
db1147 C6
ganeti1011 C6
msw-c6-eqiad C6
mw1319 C6
mw1320 C6
mw1321 C6
mw1322 C6
mw1323 C6
mw1324 C6
mw1325 C6
mw1326 C6
mw1327 C6
mw1328 C6
mw1329 C6
mw1330 C6
mw1331 C6
mw1332 C6
mw1333 C6
mw1334 C6
mw1335 C6
mw1336 C6
mw1337 C6
mw1338 C6
mw1339 C6
mw1340 C6
mw1341 C6
mw1342 C6
mw1343 C6
mw1344 C6
mw1345 C6
mw1346 C6
mw1347 C6
mw1348 C6
ps1-c6-eqiad C6
wdqs1010 C6
analytics1075 C7
an-scheduler1001 C7
an-worker1091 C7
an-worker1109 C7
an-worker1110 C7
asw2-c7-eqiad C7
asw-c7-eqiad C7
backup1002 C7
backup1002-array C7
cablemgmt-wmf5278 C7
conf1002 C7
cp1085 C7
cp1086 C7
dbprov1003 C7
dumpsdata1003 C7
elastic1051 C7
elastic1052 C7
francium C7
kafka-main1003 C7
lvs1015 C7
mc-gp1002 C7
ms-be1034 C7
ms-be1035 C7
ms-be1036 C7
ms-be1042 C7
ms-fe1008 C7
msw-c7-eqiad C7
polonium C7
ps1-c7-eqiad C7
scb1003 C7
wtp1040 C7
wtp1041 C7
wtp1042 C7

Marostegui moved this task from Triage to Next on the DBA board.
RobH updated the task description. (Show Details)Fri, Sep 11, 8:57 PM

Change 627737 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Disable notifications on masters

https://gerrit.wikimedia.org/r/627737

Change 627737 merged by Marostegui:
[operations/puppet@production] mariadb: Disable notifications on masters

https://gerrit.wikimedia.org/r/627737

Mentioned in SAL (#wikimedia-operations) [2020-09-16T08:52:41Z] <marostegui> Stop mysql on db1121, db1123, db1093 and db1109 for PDU work T261454 T261457

mysql stopped on db1121 (sanitarium master)

Change 627864 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] pdu upgrade in c6/c7

https://gerrit.wikimedia.org/r/627864

Change 627864 merged by RobH:
[operations/puppet@production] pdu upgrade in c6/c7

https://gerrit.wikimedia.org/r/627864

RobH updated the task description. (Show Details)Wed, Sep 16, 3:10 PM

mysql stopped on db1121 (sanitarium master)

host started after PDU work is done

RobH updated the task description. (Show Details)Wed, Sep 16, 4:20 PM
Cmjohnson closed this task as Resolved.Thu, Sep 17, 4:06 PM
Cmjohnson updated the task description. (Show Details)