Page MenuHomePhabricator

(Need By:TBD) rack/setup/install rack A1 and A8 new PDUs
Closed, ResolvedPublic

Description

This task will track the racking, setup and configuration of PDUs in rack A1 and A8 (Network racks). This shouldn't have any impact on devices in those racks but only the management router will be down while working on rack A1 so no access to management network during that period.

If you need to depool your server, please put a YES in the "Do you need to depool?" column and please make sure the server is depool and power down during maintenance.

Schedule

RackDateTimeComments
A12023-02-169:45am CT/2:45 pm UTC~ 45 hour to complete
A82023-02-1610:30am CT/3:30 pm UTC~ 45 hour to complete
  • Open a ticket with CY1

Per PDU setup Checklist

ps1-a1-codfw/ps2-a1-codfw

send out a notification to notify everybody that the management network will not be available for the whole site. Send out another notification also to net-ops to let them know of the ongoing maintenance.

  • - receive in new PDUs on T303460
  • - apply asset tags to each tower (both primary and link towers) as well as hostname labels.
  • - add new PDUs into netbox
  • - Downtime the old PDU in Icinga
  • - Run the "Move devices attributes" to move all settings from old PDU to new PDU
  • - Login to the master PDU and do the configuration
  • - Make sure Icinga is seeing the new PDU

List of Servers and network devices in rack A1

Servers/devicesDo you need to depool?
asw-a1-codfw
atlas-codfw
cr1-codfw
db2136
db2157
db2158
es2026
gitlab2002depool not required
kubestage2001
lsw-a1-codfw
ml-serve2005
mr1-codfw
msw1-codfw
msw-a1-codfw
scs-a1-codfw

ps1-a8-codfw/ps2-a8-codfw

send out a notification to notify everybody that the management network will not be available for the whole site. Send out another notification also to net-ops to let them know of the ongoing maintenance.

  • - receive in new PDUs on T303460
  • - apply asset tags to each tower (both primary and link towers) as well as hostname labels.
  • - add new PDUs into netbox
  • - Downtime the old PDU in Icinga
  • - Run the "Move devices attributes" to move all settings from old PDU to new PDU
  • - Login to the master PDU and do the configuration
  • - Make sure Icinga is seeing the new PDU

List of Servers and network devices in rack A8

Servers/DevicesDo you need to depool?
cr2-codfw
db2106
db2146
msw-a8-codfw
parse2004
parse2005

Event Timeline

Papaul triaged this task as Medium priority.Jan 19 2023, 2:33 PM
Papaul updated the task description. (Show Details)

Postponing the PDU maintenance for 2023-02-02 for possible bad weather in Dallas tomorrow.

Papaul renamed this task from (Need By:TBD) rack/setup/install rack A1 and A8 new PDUs 2023-01-31 to (Need By:TBD) rack/setup/install rack A1 and A8 new PDUs 2023-02-02.Jan 30 2023, 6:13 PM

Change 885825 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Disable notifications host in A1/A8 codfw

https://gerrit.wikimedia.org/r/885825

Mentioned in SAL (#wikimedia-operations) [2023-02-01T14:41:52Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2136 db2158 db2157 es2026 db2106 db2146 T327404', diff saved to https://phabricator.wikimedia.org/P43530 and previous config saved to /var/cache/conftool/dbconfig/20230201-144152-root.json

Change 885825 merged by Marostegui:

[operations/puppet@production] mariadb: Disable notifications host in A1/A8 codfw

https://gerrit.wikimedia.org/r/885825

We are postponing the PDU's maintenance once again to a new date. We will update the task once we have the new date and time.

Thank you

Papaul renamed this task from (Need By:TBD) rack/setup/install rack A1 and A8 new PDUs 2023-02-02 to (Need By:TBD) rack/setup/install rack A1 and A8 new PDUs .Feb 2 2023, 12:23 AM

Change 889992 had a related patch set uploaded (by Papaul; author: Papaul):

[operations/puppet@production] Update model for ps1-a8-codfw

https://gerrit.wikimedia.org/r/889992

Change 889992 merged by Papaul:

[operations/puppet@production] Update model for ps1-a8-codfw

https://gerrit.wikimedia.org/r/889992

Papaul updated the task description. (Show Details)

Complete