Maniphest T193654

[epic] Run multiple elasticsearch clusters on same hardware
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EBernhardson
	May 2 2018, 5:04 PM

Description

We are running into limits of the elasticsearch architecture, basically we are "full" on indices and can't really create more. Our systems are already over the baselines, with us having to adjust the default master timeout from 5s up to 30s to ensure the daily creation of completion suggesters doesn't fail. Evaluation of adding more indices to the cluster in T192972 showed the cluster having problems placing indices around the cluster even if they were empty.

High level solution:

Run two jvm's per node in separate clusters
One large jvm for wikis with shards > 100M
One small jvm for the remaining wikis
The small jvm's to be split into two clusters of ~17 nodes each.
We can almost certainly shrink the large jvm's from their current 30G to some smaller number.
Estimating small jvm's at 6g, if we can shave a couple g from the large jvm's there should be very little impact on disk cache availability

Looking at our data sizes, roughly 600 primary shards would go to the large jvm's and 2100 primary shards would be split between the two small clusters for 1000 primary shards each. Those 2100 shards represent only 32G of data, or about 100G with replicas, or mean of 3G per server. This is small enough that we shouldn't need any special considerations around data usage between the different elasticsearch instances.

This gets our cluster sizes back into manageable ranges and re-opens the ability to add new indices if it is the right solution to a problem.

Considerations:

sister-wikis should be entirely within a single cluster
commonswiki search will need some special considerations
OtherIndex has to write to a different cluster at times
Configuration to assign small wikis and sister wikis to appropriate places without spelling out each and every wiki. Or maybe we do spell it out with a dblist?
This certainly adds operational complexity
Probably more

Related Objects
Search...

Status	Assigned	Task
Resolved	EBernhardson	T183281 [epic] ELK upgrade to 6.x (elasticsearch, kibana, logstash)
Resolved	None	T183282 [epic] Search cluster upgrade to 6.x
Resolved	debt	T193654 [epic] Run multiple elasticsearch clusters on same hardware
Resolved	EBernhardson	T194678 Update OtherIndex to operate on a cluster other than the one holding the wiki
Resolved	Gehel	T198351 Refactor puppet to support multiple elasticsearch instances on same node
Resolved	EBernhardson	T198490 Use kafka for communication from analytics cluster to elasticsearch
Resolved	EBernhardson	T200215 Create kafka topic for mjolinr bulk daemon and decide on cluster
Resolved	EBernhardson	T200740 Deploy mjolnir msearch daemon to the elasticsearch clusters
Resolved	EBernhardson	T201948 Add stats collection for observability of mjolnir daemons
Resolved	Gehel	T198352 Setup two elasticsearch clusters on relforge to test multi-instance
Resolved	Gehel	T207195 Configure LVS endpoints for new elasticsearch clusters
Resolved	dcausse	T210381 Update mw-config to use the psi&omega elastic clusters
Resolved	dcausse	T211752 Adapt or configure mjolnir so that it knows all search clusters within a DC
Resolved	• Mathew.onipe	T212434 Allow elasticsearch machines to communicate with each others on port 9500 and 9700
Resolved	dcausse	T213150 Configure elasticsearch crosscluster on production search servers
Resolved	EBernhardson	T213959 Decide order of operations for elastic 6 upgrade
Resolved	dcausse	T214052 Delete indices moved from chi to psi/omega

Event Timeline

EBernhardson created this task.May 2 2018, 5:04 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2018, 5:04 PM

EBernhardson updated the task description. (Show Details)May 2 2018, 5:44 PM

EBernhardson updated the task description. (Show Details)May 2 2018, 5:48 PM

Lots of different pieces to get this epic ticket done.

debt renamed this task from Run multiple elasticsearch clusters on same hardware to [epic] Run multiple elasticsearch clusters on same hardware.May 3 2018, 5:21 PM

EBernhardson mentioned this in T192972: Evaluate impact of adding ~2700 new shards to production cluster.May 3 2018, 8:34 PM

EBernhardson added a parent task: T183282: [epic] Search cluster upgrade to 6.x.May 10 2018, 4:57 PM

• Vvjjkkii renamed this task from [epic] Run multiple elasticsearch clusters on same hardware to 1rdaaaaaaa.Jul 1 2018, 1:12 AM

• Vvjjkkii raised the priority of this task from Medium to High.