Page MenuHomePhabricator

[epic] Run multiple elasticsearch clusters on same hardware
Closed, ResolvedPublic

Description

We are running into limits of the elasticsearch architecture, basically we are "full" on indices and can't really create more. Our systems are already over the baselines, with us having to adjust the default master timeout from 5s up to 30s to ensure the daily creation of completion suggesters doesn't fail. Evaluation of adding more indices to the cluster in T192972 showed the cluster having problems placing indices around the cluster even if they were empty.

High level solution:

  • Run two jvm's per node in separate clusters
  • One large jvm for wikis with shards > 100M
  • One small jvm for the remaining wikis
  • The small jvm's to be split into two clusters of ~17 nodes each.
  • We can almost certainly shrink the large jvm's from their current 30G to some smaller number.
  • Estimating small jvm's at 6g, if we can shave a couple g from the large jvm's there should be very little impact on disk cache availability

Looking at our data sizes, roughly 600 primary shards would go to the large jvm's and 2100 primary shards would be split between the two small clusters for 1000 primary shards each. Those 2100 shards represent only 32G of data, or about 100G with replicas, or mean of 3G per server. This is small enough that we shouldn't need any special considerations around data usage between the different elasticsearch instances.

This gets our cluster sizes back into manageable ranges and re-opens the ability to add new indices if it is the right solution to a problem.

Considerations:

  • sister-wikis should be entirely within a single cluster
  • commonswiki search will need some special considerations
  • OtherIndex has to write to a different cluster at times
  • Configuration to assign small wikis and sister wikis to appropriate places without spelling out each and every wiki. Or maybe we do spell it out with a dblist?
  • This certainly adds operational complexity
  • Probably more

Related Objects

StatusSubtypeAssignedTask
ResolvedEBernhardson
ResolvedNone
Resolveddebt
ResolvedEBernhardson
ResolvedGehel
ResolvedEBernhardson
ResolvedEBernhardson
ResolvedEBernhardson
ResolvedEBernhardson
ResolvedGehel
ResolvedGehel
Resolveddcausse
Resolveddcausse
Resolved Mathew.onipe
Resolveddcausse
ResolvedEBernhardson
Resolveddcausse

Event Timeline

debt triaged this task as Medium priority.May 3 2018, 5:21 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.
debt subscribed.

Lots of different pieces to get this epic ticket done.

debt renamed this task from Run multiple elasticsearch clusters on same hardware to [epic] Run multiple elasticsearch clusters on same hardware.May 3 2018, 5:21 PM
Vvjjkkii renamed this task from [epic] Run multiple elasticsearch clusters on same hardware to 1rdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 1rdaaaaaaa to [epic] Run multiple elasticsearch clusters on same hardware.Jul 1 2018, 8:58 PM
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
debt claimed this task.

w00t!