Page MenuHomePhabricator

[epic] Run multiple elasticsearch clusters on same hardware
Closed, ResolvedPublic

Description

We are running into limits of the elasticsearch architecture, basically we are "full" on indices and can't really create more. Our systems are already over the baselines, with us having to adjust the default master timeout from 5s up to 30s to ensure the daily creation of completion suggesters doesn't fail. Evaluation of adding more indices to the cluster in T192972 showed the cluster having problems placing indices around the cluster even if they were empty.

High level solution:

  • Run two jvm's per node in separate clusters
  • One large jvm for wikis with shards > 100M
  • One small jvm for the remaining wikis
  • The small jvm's to be split into two clusters of ~17 nodes each.
  • We can almost certainly shrink the large jvm's from their current 30G to some smaller number.
  • Estimating small jvm's at 6g, if we can shave a couple g from the large jvm's there should be very little impact on disk cache availability

Looking at our data sizes, roughly 600 primary shards would go to the large jvm's and 2100 primary shards would be split between the two small clusters for 1000 primary shards each. Those 2100 shards represent only 32G of data, or about 100G with replicas, or mean of 3G per server. This is small enough that we shouldn't need any special considerations around data usage between the different elasticsearch instances.

This gets our cluster sizes back into manageable ranges and re-opens the ability to add new indices if it is the right solution to a problem.

Considerations:

  • sister-wikis should be entirely within a single cluster
  • commonswiki search will need some special considerations
  • OtherIndex has to write to a different cluster at times
  • Configuration to assign small wikis and sister wikis to appropriate places without spelling out each and every wiki. Or maybe we do spell it out with a dblist?
  • This certainly adds operational complexity
  • Probably more

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2018, 5:04 PM
EBernhardson updated the task description. (Show Details)May 2 2018, 5:44 PM
EBernhardson updated the task description. (Show Details)May 2 2018, 5:48 PM
debt triaged this task as Normal priority.May 3 2018, 5:21 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.
debt added a subscriber: debt.

Lots of different pieces to get this epic ticket done.

debt renamed this task from Run multiple elasticsearch clusters on same hardware to [epic] Run multiple elasticsearch clusters on same hardware.May 3 2018, 5:21 PM
Vvjjkkii renamed this task from [epic] Run multiple elasticsearch clusters on same hardware to 1rdaaaaaaa.Jul 1 2018, 1:12 AM
Vvjjkkii raised the priority of this task from Normal to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 1rdaaaaaaa to [epic] Run multiple elasticsearch clusters on same hardware.Jul 1 2018, 8:58 PM
CommunityTechBot lowered the priority of this task from High to Normal.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
debt moved this task from Up Next to [epic] on the Discovery-Search board.Jan 29 2019, 6:49 PM
debt closed this task as Resolved.Mar 14 2019, 9:16 PM
debt claimed this task.

w00t!