b8-eqiad pdu refresh (Thursday 10/31 @11am UTC)
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	RobH
	Jul 8 2019, 10:49 PM

Description

This task will track the migration of the ps1 and ps2 to be replaced with new PDUs in rack B8-eqiad.

Each server & switch will need to have potential downtime scheduled, since this will be a live power change of the PDU towers.

These racks have a single tower for the old PDU (with and A and B side), with the new PDUs having independent A and B towers.

- schedule downtime for the entire list of switches and servers.
- Wire up one of the two towers, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- Once new PDU tower is confirmed online, move on to next steps.
- Wire up remaining tower, energize, and relocate power to it from existing/old pdu tower (now de-energized).
- confirm entire list of switches, routers, and servers have had their power restored from the new pdu tower
- connect via serial / confirm serial connection works
- setup PDU following directions on https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/ServerTech#Initial_Setup
- update PDU model in puppet per T233129.

List of routers, switches, and servers

device	role	SRE team coordination	recommended action during maintainance
asw-b8-eqiad	asw	@ayounsi	ensure this doesn't go offline as it will take entire rack network offline
ganeti1018	ganeti host	serviceops	needs to be emptied of VMs before
gerrit1001	spare		fine to do at anytime
cloudvirt1030	hypervisor	cloud-services-team	Lots of VMs, please handle with care.
db1132	m2 master	DBA	This host is m2 master which holds some internal services, ensure it doesn't go offline, if it does, there is an automatic failover via proxies.
pc1008	parsercache host	DBA	DBA to depool it
restbase1024	restbase	serviceops, Services	fine to do at anytime
an-master1002		Analytics	fine to do any time
dbproxy1015	db proxy	DBA	Not in use
graphite1004		@fgiunchedi	no action needed, if power is lost and can't be restored quickly we'll switch to codfw
rdb1009	redis master	serviceops	this will need coordination?
notebook1003
db1119	db host	DBA	DBA to depool it
db1113	db host	DBA	DBA to depool it
cloudservices1003	DNS	cloud-services-team	fine to do at anytime
mwmaint1002			This is the primary mw maint system in eqiad, perhaps we should halt deployments during this time?
labpuppetmaster1001	spare	cloud-services-team	Good to go. Host is being decommissioned.
ores1004	ORES	serviceops	fine do to at any time
wtp1036	parsoid	serviceops	fine to do at any time
wtp1035	parsoid	serviceops	fine to do at any time
wtp1034	parsoid	serviceops	fine to do at any time
dumpsdata1001	dumps data server	@ArielGlenn	coordinate please
analytics1063		Analytics	fine to do any time
analytics1062		Analytics	fine to do any time
analytics1061		Analytics	fine to do any time

Details

	Subject	Repo	Branch	Lines +/-
	ps1-b8-eqiad update for monitoring	operations/puppet	production	+4 -3
	mariadb: depool pc1008 temporarily	operations/mediawiki-config	master	+2 -2

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• Cmjohnson	T226778 Install new PDUs in rows A/B (Top level tracking task)
		Resolved		None	T227543 b8-eqiad pdu refresh (Thursday 10/31 @11am UTC)

Event Timeline

RobH created this task.Jul 8 2019, 10:49 PM

RobH mentioned this in T226778: Install new PDUs in rows A/B (Top level tracking task).

• Cmjohnson moved this task from Backlog to High Priority Task on the ops-eqiad board.Jul 22 2019, 2:41 PM

RobH moved this task from High Priority Task to Blocked on the ops-eqiad board.Jul 26 2019, 1:37 PM

wiki_willy renamed this task from b8-eqiad pdu refresh to b8-eqiad pdu refresh (Thursday 10/31 @11am UTC).Aug 15 2019, 5:39 PM

RobH updated the task description. (Show Details)Aug 28 2019, 6:27 PM

RobH updated the task description. (Show Details)

RobH added subscribers: ayounsi, • Nuria, ArielGlenn.

RobH added a subscriber: akosiaris.

RobH removed RobH as the assignee of this task.Aug 28 2019, 6:30 PM

RobH triaged this task as High priority.

RobH updated the task description. (Show Details)

fgiunchedi updated the task description. (Show Details)Sep 2 2019, 9:09 AM

fgiunchedi subscribed.

elukey updated the task description. (Show Details)Sep 17 2019, 6:04 AM

akosiaris updated the task description. (Show Details)Sep 17 2019, 6:59 AM

ArielGlenn updated the task description. (Show Details)Sep 17 2019, 8:25 AM

Marostegui updated the task description. (Show Details)Sep 25 2019, 1:48 PM

Marostegui updated the task description. (Show Details)

Marostegui updated the task description. (Show Details)Sep 25 2019, 1:50 PM

RobH updated the task description. (Show Details)Oct 11 2019, 8:42 PM

wiki_willy assigned this task to • Cmjohnson.Oct 28 2019, 5:22 PM

Starting pdu refresh eqiad b8

Mentioned in SAL (#wikimedia-cloud) [2019-10-31T11:01:01Z] <arturo> icinga-downtimed cloudvirt1030 and cloudservices1003 for 1h due to PDU upgrade operations T227543

aborrero updated the task description. (Show Details)Oct 31 2019, 11:03 AM

Change 547508 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: depool pc1008 temporarily

https://gerrit.wikimedia.org/r/547508

gerritbot added a project: Patch-For-Review.Oct 31 2019, 11:33 AM

Mentioned in SAL (#wikimedia-operations) [2019-10-31T11:37:01Z] <jynus@cumin1001> dbctl commit (dc=all): 'Depool db1119, db1113 T227543', diff saved to https://phabricator.wikimedia.org/P9507 and previous config saved to /var/cache/conftool/dbconfig/20191031-113659-jynus.json

Change 547508 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: depool pc1008 temporarily

https://gerrit.wikimedia.org/r/547508

Mentioned in SAL (#wikimedia-operations) [2019-10-31T11:43:33Z] <jynus@deploy1001> Synchronized wmf-config/db-eqiad.php: depooling pc1008 T227543 (duration: 01m 01s)

Maintenance_bot removed a project: Patch-For-Review.Oct 31 2019, 12:11 PM

finished pdu refresh, netbox updated,

Jclark-ctr reassigned this task from • Cmjohnson to RobH.Oct 31 2019, 12:27 PM

Jclark-ctr updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2019-10-31T13:16:07Z] <jynus@cumin1001> dbctl commit (dc=all): 'Repool db1119, db1113 at 10% T227543', diff saved to https://phabricator.wikimedia.org/P9509 and previous config saved to /var/cache/conftool/dbconfig/20191031-131606-jynus.json

Mentioned in SAL (#wikimedia-operations) [2019-10-31T13:21:19Z] <jynus@deploy1001> Synchronized wmf-config/db-eqiad.php: repool pc1008 T227543 (duration: 01m 02s)

Please note the serial connection for ps1-b8-eqiad is non-functional at this time.

7 $> ssh root@scs-a8-eqiad.mgmt.eqiad.wmnet
Password: 
# pmshell

 1: ps1-a1-eqiad                                  2: ps1-a2-eqiad                                  3: ps1-a3-eqiad                                  4: ps1-a4-eqiad                                 
 5: ps1-a5-eqiad                                  6: ps1-a6-eqiad                                  7: ps1-a7-eqiad                                  8: ps1-a8-eqiad                                 
 9: ps1-b1-eqiad                                 10: ps1-b2-eqiad                                 11: ps1-b3-eqiad                                 12: ps1-b4-eqiad                                 
13: ps1-b5-eqiad                                 14: ps1-b6-eqiad                                 15: ps1-b7-eqiad                                 16: ps1-b8-eqiad                                 
17: asw-a1-eqiad                                 18: asw-a2-eqiad                                 19: asw-a3-eqiad                                 20: asw-a4-eqiad                                 
21: asw-a5-eqiad                                 22: asw-a6-eqiad                                 23: asw-a7-eqiad                                 24: asw-a8-eqiad                                 
25: asw-b1-eqiad                                 26: asw-b2-eqiad                                 27: asw-b3-eqiad                                 28: asw-b4-eqiad                                 
29: asw-b5-eqiad                                 30: asw-b6-eqiad                                 31: asw-b7-eqiad                                 32: asw-b8-eqiad                                 
33: re0.cr1-eqiad                                34: re1.cr1-eqiad                                35: re0.cr2-eqiad                                36: re1.cr2-eqiad                                
37: mr1-eqiad                                    40: msw1-eqiad                                   41: asw2-a5-eqiad                                45: asw2-a3-eqiad                                

Connect to port > 16

When I hit enter, it should prompt for the login, but does not.

This needs to be fixed by on-sites.

RobH reassigned this task from Jclark-ctr to • Cmjohnson.Oct 31 2019, 4:28 PM

Cable reseated (clip was bent) by @Jclark-ctr - reassigning back to @RobH for configuration.

Reseated cable fixed bent clip.

# pmshell

 1: ps1-a1-eqiad    2: ps1-a2-eqiad    3: ps1-a3-eqiad    4: ps1-a4-eqiad
 5: ps1-a5-eqiad    6: ps1-a6-eqiad    7: ps1-a7-eqiad    8: ps1-a8-eqiad
 9: ps1-b1-eqiad   10: ps1-b2-eqiad   11: ps1-b3-eqiad   12: ps1-b4-eqiad
13: ps1-b5-eqiad   14: ps1-b6-eqiad   15: ps1-b7-eqiad   16: ps1-b8-eqiad
17: asw-a1-eqiad   18: asw-a2-eqiad   19: asw-a3-eqiad   20: asw-a4-eqiad
21: asw-a5-eqiad   22: asw-a6-eqiad   23: asw-a7-eqiad   24: asw-a8-eqiad
25: asw-b1-eqiad   26: asw-b2-eqiad   27: asw-b3-eqiad   28: asw-b4-eqiad
29: asw-b5-eqiad   30: asw-b6-eqiad   31: asw-b7-eqiad   32: asw-b8-eqiad
33: re0.cr1-eqiad  34: re1.cr1-eqiad  35: re0.cr2-eqiad  36: re1.cr2-eqiad
37: mr1-eqiad      40: msw1-eqiad     41: asw2-a5-eqiad  45: asw2-a3-eqiad

Connect to port > 16

Sentry Smart PDU Version 8.0n

Username:

Mentioned in SAL (#wikimedia-operations) [2019-10-31T21:25:13Z] <robh> setting up ps1-b8-eqiad per T227543. it will reboot twice in the next 15 minutes, and then should start to clear up in icinga