Page MenuHomePhabricator

b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC)
Closed, ResolvedPublic

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B2-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

  • - schedule downtime for the entire list of switches and servers.
  • - Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - Once new PDU tower is confirmed online, move on to next steps.
  • - Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
  • - confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
  • - connect via serial / confirm serial connection works
  • - setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
  • - update PDU model in puppet per T233129.

List of routers, switches, and servers

deviceroleSRE team coordinationnotes
asw2-b2-eqiadasw@ayounsi
db1072dbDBAHost powered off, DO NOT POWER BACK ON pending onsite decommissioning steps : T228956
ms-be1020ms-be@fgiunchedipoweroff / poweron
cloduvirt1002cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1001cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1024cloudvirtcloud-services-teamactive VMs, but low risk
cloudvirt1018cloudvirtcloud-services-teama lot of active VMs, please handle with care
cloudvirt1012cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1009cloudvirtcloud-services-teamactive VMs, please handle with care
cloudvirt1015cloudvirtcloud-services-teamOffline for maintenance
cloudvirt1008cloudvirtcloud-services-teamactive VMs, please handle with care
db1099dbDBA@jcrespo to depool this host
ms-be1047ms-be@fgiunchedipoweroff / poweron
cloudvirtan1002cloudvirtan@Ottomatano longer in cloud services, not sure of the status
cloudvirtan1001cloudvirtan@Ottomatano longer in cloud services, not sure of the status
an-worker1084Analyticsfine to do any time
an-worker1083Analyticsfine to do any time
cloudelastic1002cloudelasticDiscovery-Search@Gehel good to go
cp1080cpTraffic
cp1079cpTraffic
analytics1072Analyticsfine to do any time

Event Timeline

cp1079 and cp1080 just need normal depooling process here.

From the DB side, this can be done after Thursday 25th as db1072 will no longer be a master

As of today, db1072 is no longer a master (T228243#5363931), so this rack is also good to go. db1072 will be decommissioned in a few days

RobH removed RobH as the assignee of this task.Aug 14 2019, 4:53 PM
wiki_willy renamed this task from b2-eqiad pdu refresh to b2-eqiad pdu refresh (Tuesday 10/29 @11am UTC).Aug 15 2019, 5:34 PM
CDanis triaged this task as Medium priority.Aug 16 2019, 1:01 PM
Marostegui updated the task description. (Show Details)
Marostegui added a subscriber: jcrespo.

@jcrespo - I will be on holidays this day, hence I added you on the list as primary contact for db1099 :)

elukey updated the task description. (Show Details)
JHedden updated the task description. (Show Details)
JHedden updated the task description. (Show Details)
JHedden added a subscriber: Ottomata.

db1099 is depooled and down, please ping me on IRC when done so I can put it up. dbs area all ready for maintenance.

Mentioned in SAL (#wikimedia-cloud) [2019-10-29T10:52:51Z] <arturo> icinga downtime cloudvirt1001/1002/1024/1018/1012/1009/1015/1008 for 1h T227538

Jclark-ctr updated the task description. (Show Details)

netbox updated/ serial connected

Mentioned in SAL (#wikimedia-operations) [2019-10-29T14:45:56Z] <robh> setting up ps1-b2-eqiad, librenms will output a couple reboots from it T227538

Change 546959 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] setting ps1-b2-eqiad model info

https://gerrit.wikimedia.org/r/546959

Change 546959 merged by RobH:
[operations/puppet@production] setting ps1-b2-eqiad model info

https://gerrit.wikimedia.org/r/546959

I will repool the db host, as I do not want to leave them off for a long time. Let me know if more operations on this row are planned at a later time.

I've completed all setup work from the software side for the PDU at this time.

all green in icinga, resolving task