Build warm slave for Gerrit in Dallas
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• demon
	Oct 14 2016, 2:16 PM

Description

Per the fallout from the lead failure, it was discussed by myself, @Dzahn and the rest of RelEng that having a warm spare of Gerrit running (ideally in the other DC) is necessary to avoid extended downtimes of a crucial service.

It doesn't have to be completely hot and failover does not have to be instantaneous/automatic, but the warmer it gets and the less we have to do to swap the better.

Ideally I'm thinking:

Misc server in codfw with public IP like eqiad
Run gerrit role in slave mode (read-only)
rsync git, lucene data hourly from master -> slave (index drift & rebuild would suck!)
Failover would be "swap which node is master" and "swap dns"

Details

	Subject	Repo	Branch	Lines +/-
	add gerrit2001.mgmt for WMF6408.mgmt	operations/dns	master	+1 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	• demon	T148186 Build warm slave for Gerrit in Dallas
Resolved	RobH	T148187 Requesting 1 spare misc box for Gerrit in codfw
		Unknown Object (Task)
Resolved	• demon	T152525 setup/install gerrit2001/WMF6408
Resolved	Papaul	T152527 update the label and racktables entry for gerrit2001/WMF6408 & install SSDs
Resolved	faidon	T156957 asw-d-codfw public1-vlan addition review (blocks gerrit2001)

Event Timeline

• demon created this task.Oct 14 2016, 2:16 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 14 2016, 2:16 PM

• demon created subtask T148187: Requesting 1 spare misc box for Gerrit in codfw.Oct 14 2016, 2:19 PM

Paladox subscribed.Oct 14 2016, 2:25 PM

Dzahn awarded a token.Oct 14 2016, 2:55 PM

Luke081515 awarded a token.Oct 14 2016, 8:23 PM

Luke081515 subscribed.

Change 325596 had a related patch set uploaded (by Dzahn):
add gerrit2001.mgmt for WMF6408.mgmt

https://gerrit.wikimedia.org/r/325596

gerritbot added a project: Patch-For-Review.Dec 6 2016, 6:25 PM

Change 325596 abandoned by Dzahn:
add gerrit2001.mgmt for WMF6408.mgmt

Reason:
it was already done by Rob now in https://gerrit.wikimedia.org/r/#/c/325860/1

https://gerrit.wikimedia.org/r/325596

RobH closed subtask T148187: Requesting 1 spare misc box for Gerrit in codfw as Resolved.Feb 9 2017, 7:59 PM

Spare is running in Dallas, data is being replicated in real time so I think we're warm.

Only improvements would be like shared cache stores (T152802) and swapping to elasticsearch for shared indexing. Then we'd be able to run a much hotter spare.

But I think we could fail over pretty quick at this point so resolving.

Krinkle mentioned this in T156937: Provide cross-dc redundancy (active-active or active-passive) to all important misc services.Jul 25 2017, 6:09 PM

• Phabricator_maintenance edited projects, added RelEng-Archive-FY201718-Q1; removed Release-Engineering-Team.Dec 21 2017, 7:01 PM

Build warm slave for Gerrit in DallasClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Build warm slave for Gerrit in Dallas
Closed, ResolvedPublic
Actions

Related Objects
Search...