Page MenuHomePhabricator

New Date - Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8
Closed, ResolvedPublicRequest

Description

ps1-d7-eqiad & ps2-d7-eqiad:

  • - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
  • - netbox updated
  • - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
  • - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
  • - install link PDU into the cabinet
  • - de-power old/existing B side power, and plug in new B side link PDU
  • - migrate all B side power connections to new link PDU
  • - Note all B side power connections, input into netbox for every single power port used.
  • - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
  • - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
  • - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
  • - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
  • - Install new PDU tower into the rack, and route power cable for easy cut-over.
  • - de-power old/existing A side power, and plug in new A side link PDU
  • - migrate all A side power connections to new link PDU
  • - Note all A side power connections, input into netbox for every single power port used.
  • - audit all A side connections to ensure all devices are receiving full power on the A side connection.
  • - connect serial to new PDU, ensure serial connection is functional
  • - (Rob) setup network configuration of new PDU via serial
  • - (Rob) setup remaining pdu configuration via https interface
  • - (Rob) update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
  • - (Rob) Update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
  • - (Rob) Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry.
  • - ensure all errors clear in icinga and netbox after work completes

ps1-d8-eqiad & ps2-d8-eqiad:

  • - apply asset tags to each tower (both primary and link towers) as well has hostname labels.
  • - netbox updated
  • - check the existing PDU and all connected cables. Ensure all are properly seated and all items are receiving power from both A and B sides before continuing. Anything not seated or not receiving dual power will be rebooted by continuing this checklist.
  • - install new PDU brackets for the link tower in the rack (see above note on orientation of the brackets.)
  • - install link PDU into the cabinet
  • - de-power old/existing B side power, and plug in new B side link PDU
  • - migrate all B side power connections to new link PDU
  • - Note all B side power connections, input into netbox for every single power port used.
  • - When relocating power cables, please try to ensure that the A and B sides use the same port. If server bast1001 plugs into port 5 on tower B, please also have it plug into port 5 on tower A.
  • - audit all B side connections to ensure all devices are receiving full power on the B side connection (any not receiving power will be rebooted when we move the A side connections next.)
  • - BEFORE UNPLUGGING THE A SIDE ORIGINAL TOWER: Login to the PDU via the HTTPS interface and reset it to factory defaults!
  • - Unmount existing PDU tower and set aside (if possible) to install new PDU brackets into the rack.
  • - Install new PDU tower into the rack, and route power cable for easy cut-over.
  • - de-power old/existing A side power, and plug in new A side link PDU
  • - migrate all A side power connections to new link PDU
  • - Note all A side power connections, input into netbox for every single power port used.
  • - audit all A side connections to ensure all devices are receiving full power on the A side connection.
  • - connect serial to new PDU, ensure serial connection is functional
  • - (Rob) setup network configuration of new PDU via serial
  • - (Rob) setup remaining pdu configuration via https interface
  • - (Rob) update puppet repo file: modules/facilities/manifests/init.pp to add the senty4 line to the PDU entry.
  • - (Rob) Update librenms to reflect new PDU. (unclear if you must delete the old and add new, or if the new will update when its wholly online, so far only done via removing old and adding new device.
  • - (Rob) Update IP address entries in netbox, for now just leave the ip tied to old PDU netbox entry.
  • - ensure all errors clear in icinga and netbox after work completes

Event Timeline

List of hostnames in racks D7 and D8 below:

analytics1077 D7
an-presto1003 D7
an-worker1094 D7
an-worker1095 D7
an-worker1101 D7
an-worker1115 D7
an-worker1116 D7
asw2-d7-eqiad D7
cablemgmt-wmf5290 D7
cloudstore1009 D7
cloudstore1009-array1 D7
cp1089 D7
cp1090 D7
kafka-jumbo1008 D7
kafka-jumbo1009 D7
kafka-main1005 D7
logstash1029 D7
lvs1016 D7
mc-gp1003 D7
ms-be1026 D7
ms-be1037 D7
ms-be1038 D7
ms-be1039 D7
ms-be1056 D7
msw-d7-eqiad D7
ps1-d7-eqiad D7
ps2-d7-eqiad D7
thanos-be1004 D7
analytics1035 D8
analytics1036 D8
analytics1037 D8
analytics1042 D8
analytics1043 D8
analytics1044 D8
analytics1045 D8
analytics1067 D8
analytics1068 D8
analytics1069 D8
asw2-d8-eqiad D8
cablemgmt-wmf5279 D8
conf1003 D8
db1091 D8
db1092 D8
db1093 D8
db1094 D8
db1102 D8
db1109 D8
db1123 D8
db1138 D8
es1019 D8
ganeti1021 D8
ganeti1022 D8
ms-be1021 D8
msw-d8-eqiad D8
mw1383 D8
mw1384 D8
ps1-d8-eqiad D8
ps2-d8-eqiad D8

Please take extra care with db1123, db1093 and db1109, they are an eqiad masters and lots of slaves hang from them. We might stop mysql just in case.

Apologies for the last minute change, the upgrades for these 2x PDUs will be postponed until a later date. Both dc-ops engineers at eqiad are recovering from independent injuries (both earlier today), and will be out the rest of the week. Thanks, Willy

wiki_willy renamed this task from Thur, Sept 10 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 to New Date - Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8.Sep 11 2020, 7:36 PM
wiki_willy reassigned this task from Jclark-ctr to Cmjohnson.
wiki_willy added a subscriber: Jclark-ctr.

Change 627737 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Disable notifications on masters

https://gerrit.wikimedia.org/r/627737

Change 627737 merged by Marostegui:
[operations/puppet@production] mariadb: Disable notifications on masters

https://gerrit.wikimedia.org/r/627737

Mentioned in SAL (#wikimedia-operations) [2020-09-16T08:52:41Z] <marostegui> Stop mysql on db1121, db1123, db1093 and db1109 for PDU work T261454 T261457

mysql stopped on db1123 (s3 master), db1093 (s6 master) and db1109 (s8 master)

mysql stopped on db1123 (s3 master), db1093 (s6 master) and db1109 (s8 master)

hosts restarted after Chris has finished these racks

Change 627880 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] d7 d8 pdu upgrades

https://gerrit.wikimedia.org/r/627880

Change 627880 merged by RobH:
[operations/puppet@production] d7 d8 pdu upgrades

https://gerrit.wikimedia.org/r/627880

RobH removed a project: Patch-For-Review.
RobH updated the task description. (Show Details)
Cmjohnson updated the task description. (Show Details)