Page MenuHomePhabricator

b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC)
Closed, ResolvedPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B2-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - connect via serial / confirm serial connection works
  • - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
  • - update PDU model in puppet per T233129.

List of routers, switches, and servers

deviceroleSRE team coordinationnotes
asw2-b2-eqiadasw@ayounsi
db1072dbDBAHost powered off, DO NOT POWER BACK ON pending onsite decommissioning steps : T228956
ms-be1020ms-be@fgiunchedipoweroff / poweron
cloduvirt1002cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1001cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1024cloudvirtcloud-services-teamactive VMs, but low risk
cloudvirt1018cloudvirtcloud-services-teama lot of active VMs, please handle with care
cloudvirt1012cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1009cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1015cloudvirtcloud-services-teamOffline for maintenance
cloudvirt1008cloudvirtcloud-services-teamactive VMs, please handle with care
db1099dbDBA@jcrespo to depool this host
ms-be1047ms-be@fgiunchedipoweroff / poweron
cloudvirtan1002cloudvirtan@Ottomatano longer in cloud services, not sure of the status
cloudvirtan1001cloudvirtan@Ottomatano longer in cloud services, not sure of the status
an-worker1084Analyticsfine to do any time
an-worker1083Analyticsfine to do any time
cloudelastic1002cloudelasticDiscovery-Search@Gehel good to go
cp1080cpTraffic
cp1079cpTraffic
analytics1072Analyticsfine to do any time

Details

Related Gerrit Patches:
operations/puppet : productionsetting ps1-b2-eqiad model info

Event Timeline

RobH created this task.Jul 8 2019, 10:44 PM
RobH updated the task description. (Show Details)Jul 10 2019, 10:49 PM
RobH added subscribers: ayounsi, fgiunchedi.
BBlack added a subscriber: BBlack.Jul 22 2019, 8:13 PM

cp1079 and cp1080 just need normal depooling process here.

From the DB side, this can be done after Thursday 25th as db1072 will no longer be a master

fgiunchedi updated the task description. (Show Details)Jul 23 2019, 9:33 AM

As of today, db1072 is no longer a master (T228243#5363931), so this rack is also good to go. db1072 will be decommissioned in a few days

Marostegui updated the task description. (Show Details)Jul 25 2019, 5:45 AM
RobH moved this task from High Priority Task to Blocked on the ops-eqiad board.Jul 26 2019, 1:37 PM
RobH removed RobH as the assignee of this task.Aug 14 2019, 4:53 PM
wiki_willy renamed this task from b2-eqiad pdu refresh to b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).Aug 15 2019, 5:34 PM
CDanis triaged this task as Normal priority.Aug 16 2019, 1:01 PM
Marostegui updated the task description. (Show Details)Aug 19 2019, 10:38 AM
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: jcrespo.

@jcrespo - I will be on holidays this day, hence I added you on the list as primary contact for db1099 :)

Gehel updated the task description. (Show Details)Aug 19 2019, 4:18 PM
Gehel added a subscriber: Gehel.
elukey updated the task description. (Show Details)Sep 17 2019, 5:58 AM
elukey updated the task description. (Show Details)
Marostegui updated the task description. (Show Details)Sep 25 2019, 1:42 PM
RobH updated the task description. (Show Details)Oct 11 2019, 8:41 PM
Marostegui updated the task description. (Show Details)Thu, Oct 24, 6:10 AM
JHedden removed Cmjohnson as the assignee of this task.Mon, Oct 28, 5:24 PM
JHedden updated the task description. (Show Details)
JHedden updated the task description. (Show Details)
JHedden added a subscriber: Ottomata.

Mentioned in SAL (#wikimedia-operations) [2019-10-29T08:43:35Z] <jynus> shutting down db1099 T227538

db1099 is depooled and down, please ping me on IRC when done so I can put it up. dbs area all ready for maintenance.

Mentioned in SAL (#wikimedia-cloud) [2019-10-29T10:52:51Z] <arturo> icinga downtime cloudvirt1001/1002/1024/1018/1012/1009/1015/1008 for 1h T227538

starting pdu refresh

Finished with swapping of pdu`s

Jclark-ctr reassigned this task from Cmjohnson to RobH.Tue, Oct 29, 1:18 PM
Jclark-ctr updated the task description. (Show Details)

netbox updated/ serial connected

Mentioned in SAL (#wikimedia-operations) [2019-10-29T14:45:56Z] <robh> setting up ps1-b2-eqiad, librenms will output a couple reboots from it T227538

Change 546959 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting ps1-b2-eqiad model info

https://gerrit.wikimedia.org/r/546959

Change 546959 merged by RobH:
[operations/puppet@production] setting ps1-b2-eqiad model info

https://gerrit.wikimedia.org/r/546959

RobH updated the task description. (Show Details)

I will repool the db host, as I do not want to leave them off for a long time. Let me know if more operations on this row are planned at a later time.

RobH added a comment.Tue, Oct 29, 3:10 PM

I've completed all setup work from the software side for the PDU at this time.

RobH closed this task as Resolved.Tue, Oct 29, 11:03 PM

all green in icinga, resolving task