Set up a CirrusSearch cluster in codfw (Dallas, Texas)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Joe
	Jul 13 2015, 4:23 PM

Description

We want to have a functioning Search cluster in codfw. We assumed we want an AP system, so we'll keep the two clusters decoupled and what will happen is:

Any Cirrus job is enqueued and writes to both DCs [1]
If a job on one DC fails, re-enqueue just that job

[1] How to do this is debatable: if we do the parsing once and just make the jobrunners in the primary dc talk to the ElasticSearch cluster, we spare quite a few resources, but we have an higher network traffic. If we spawn a job on the secondary DC jobqueue instead, it will be a bit more complex to manage and we use more resources, but we will save network bandwidth. T105705 is related to this.

Apart from design decisions, the steps here will be:

Procure the hardware - 24 of the nicest servers we have in eqiad for search (?)
Set up the hardware in mutliple rows/racks
Maybe throw in 3 small/old spares as master-only nodes?
Puppet - check the puppet code for ''eqiadisms''
Actually implement the job changes to write to both datacenters.

Revisions and Commits

rOPUP Wikimedia Puppet
	rOPUP020721a1c6a5 WIP elasticsearch: apply elasticsearch::server role to codfw
	rOPUP798cf35f67c0 elasticsearch: apply elasticsearch::server role to codfw
	rOPUP39a1fb3ae705 elastic: codfw eligible master in row a/b/c
	rOPUP67c5c9cf4638 elasticsearch: apply elasticsearch::server role to codfw
	rOPUPc3bcbb4b69b1 elastic: codfw eligible master in row a/b/c
	rOPUPbab54042a72a elasticsearch: apply elasticsearch::server role to codfw

Related Objects
Search...

Status	Assigned	Task
Resolved	• Deskana	T105703 Set up a CirrusSearch cluster in codfw (Dallas, Texas)
Resolved	RobH	T105707 Request Elasticsearch hardware for secondary CirrusSearch in codfw
Resolved	RobH	T97049 CODFW Search Servers
Duplicate	None	T105709 Implement multi-DC support in CirrusSearch
Resolved	• chasemp	T105708 Decide on and document the implementation for multi data centre CirrusSearch
Resolved	• Gage	T105705 Evaluate traffic flow between the Jobrunners and the Cirrus cluster
Invalid	• chasemp	T105711 Rollout CirrusSearch to codfw as a backup data centre
Resolved	EBernhardson	T86781 Support multiple datacenters in CirrusSearch
Duplicate	EBernhardson	T109734 enable cirrussearch to talk to two clusters
Resolved	Smalyshev	T113018 maintenance script to copy the ES index from one cluster to another

Event Timeline

Joe created this task.Jul 13 2015, 4:23 PM

Joe raised the priority of this task from to Needs Triage.

Joe updated the task description. (Show Details)

Joe added projects: acl*sre-team, Discovery-ARCHIVED.

Joe subscribed.

Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptJul 13 2015, 4:23 PM

dcausse subscribed.Jul 13 2015, 4:24 PM

• Gage subscribed.Jul 13 2015, 4:24 PM

Joe updated the task description. (Show Details)Jul 13 2015, 4:45 PM

Joe set Security to None.

Joe triaged this task as High priority.Jul 14 2015, 7:06 AM

Joe added a project: codfw-rollout-Jul-Sep-2015.

• chasemp added a subtask: T86781: Support multiple datacenters in CirrusSearch.Jul 22 2015, 10:41 PM

• chasemp mentioned this in T109126: logstash insertions to ElasticSearch cross-DC functionality needs figuring out.Aug 14 2015, 7:36 PM

• Deskana added a subtask: T109734: enable cirrussearch to talk to two clusters.Aug 27 2015, 5:37 PM

• Deskana renamed this task from Cirrus search in codfw to Set up a CirrusSearch cluster in codfw (Dallas, Texas).Sep 15 2015, 4:26 PM

• chasemp closed subtask T105707: Request Elasticsearch hardware for secondary CirrusSearch in codfw as Resolved.Sep 15 2015, 10:08 PM

• chasemp added a commit: rOPUPbab54042a72a: elasticsearch: apply elasticsearch::server role to codfw.Sep 16 2015, 3:09 PM

• chasemp added a commit: rOPUPc3bcbb4b69b1: elastic: codfw eligible master in row a/b/c.Sep 16 2015, 4:59 PM

• chasemp added a subtask: T113018: maintenance script to copy the ES index from one cluster to another.Sep 18 2015, 2:35 PM

• chasemp closed subtask T105711: Rollout CirrusSearch to codfw as a backup data centre as Invalid.

• chasemp renamed this task from Set up a CirrusSearch cluster in codfw (Dallas, Texas) to [EPIC] Set up a CirrusSearch cluster in codfw (Dallas, Texas).Sep 18 2015, 2:38 PM

• chasemp lowered the priority of this task from High to Medium.

• chasemp closed subtask T105705: Evaluate traffic flow between the Jobrunners and the Cirrus cluster as Resolved.Sep 25 2015, 6:53 PM

• chasemp closed subtask T105708: Decide on and document the implementation for multi data centre CirrusSearch as Resolved.Sep 25 2015, 7:31 PM

Smalyshev closed subtask T113018: maintenance script to copy the ES index from one cluster to another as Resolved.Oct 14 2015, 5:34 PM

• Deskana closed subtask T86781: Support multiple datacenters in CirrusSearch as Resolved.Oct 27 2015, 8:46 AM

ping'd on irc but, is this epic task now done? anything remaining?

In T105703#1793970, @chasemp wrote:

ping'd on irc but, is this epic task now done? anything remaining?

As far as I know. it's resolved. Yay! :-)

This is now up and running with a full copy of the index and all writes going to it. We should do a load test and ensure this meets our expectations before declaring victory though.

reopening for the remaining subtask :)

We already have two tasks tracking a real world load testing (T117714 & T121741) and this task's name is a bit misleading since we have set up the codfw cluster for some time now. Resolving this one.

Aklapper added a project: codfw-rollout.Jan 20 2016, 6:28 PM

• chasemp added a commit: rOPUP67c5c9cf4638: elasticsearch: apply elasticsearch::server role to codfw.Apr 28 2016, 3:13 AM

• chasemp added a commit: rOPUP39a1fb3ae705: elastic: codfw eligible master in row a/b/c.

• chasemp added a commit: rOPUP020721a1c6a5: WIP elasticsearch: apply elasticsearch::server role to codfw.

• chasemp added a commit: rOPUP798cf35f67c0: elasticsearch: apply elasticsearch::server role to codfw.