Evaluate reducing shard counts for smaller wikis
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	EBernhardson
	Jan 21 2016, 7:29 PM

Description

We think that some of the slowness with the master server in elasticsearch is due to the number of indices and the number of shards in the cluster. We should write up some script to parse through elasticsearch's /_cat/indices API and determine which indexes could have fewer total shards assigned to them. In some of our documentation nik suggested that ~2GB is a good size for shards. We should also re-evaluate this number, we don't know that it's wrong, but we also don't know that it's right. There are tradeoff's in both directions.

This fits in with our Q3 goal of evaluating the current elasticsearch configuration and optimizing it as appropriate.

Details

	Subject	Repo	Branch	Lines +/-
	Adjust cirrus titlesuggest index shard counts	operations/mediawiki-config	master	+25 -22

Customize query in gerrit

Related Objects

Mentioned Here: P2510 combine_es_indices.php
P2509 parse_es_indices.php

Event Timeline

EBernhardson created this task.Jan 21 2016, 7:29 PM

EBernhardson raised the priority of this task from to Needs Triage.

EBernhardson updated the task description. (Show Details)

EBernhardson added a project: CirrusSearch.

EBernhardson added subscribers: EBernhardson, dcausse.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptJan 21 2016, 7:29 PM

Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript

EBernhardson updated the task description. (Show Details)Jan 21 2016, 8:08 PM

EBernhardson set Security to None.

I took a first stab at this with P2509. To run use curl -s elastic1001:9200/_cat/indices?bytes=b | php parse_es_indices.php

Using max shard sizes of 500MB for titlesuggest and 2GB for everything else we find most of the gain would happen in the titlesuggest. We can remove 174 shards, which is only 2% of the total shard count. It's something but not amazing.

It turns out all of the normal indices, except fiwiki_general, are fine. fiwiki_general could be reduced from 10 shards to 3 shards achieving 1.4GB per shard. Numerous titlesuggest indices can still be shrunk.

Change 261287 had a related patch set uploaded (by EBernhardson):
Adjust cirrus titlesuggest index shard counts

https://gerrit.wikimedia.org/r/261287

gerritbot added a project: Patch-For-Review.Jan 21 2016, 8:16 PM

Something else to consider might be evaluating shared indices for projects of the same language? This wouldn't make sense for everything, large indices like enwiki and dewiki should almost certainly be their own thing. But what about the smaller wiki's? This might add unnecessary complexity that isn't worth the maintenance burden though.

Additionally completion suggester would probably need to stay one index per wiki.

Attempted to estimate combining projects of same language with P2510

Setting a cap on the combined project size to 2GB (1 shard) we could reduce the total shard count (with replicas) across the cluster by 1761 shards (19.4%). This still keeps content and general indices separate.

I pulled this into the sprint and put it in "Needs review", since it has a patch assigned to it. Judging by the title of this task, that may or may not have been the correct action. Feel free to undo that if it's wrong.

Change 261287 merged by jenkins-bot:
Adjust cirrus titlesuggest index shard counts

https://gerrit.wikimedia.org/r/261287

posted to wrong ticket...

• MZMcBride subscribed.Jan 26 2016, 1:20 AM

There were some ideas about improving things here by putting a bunch of different projects into the same shard. e.g. having, say, French Wikibooks, Wiktionary, Wikisource, etc. in the same shard. This would not have any user facing changes at all, but would help with the technical issue. This would probably require a lot of added complexity into CirrusSearch that would mean it probably wouldn't be worth the effort.

Given that, @EBernhardson and I decided to decline this task. We can always reopen in the future if we do decide to work on this.

• Deskana moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Jan 28 2016, 6:10 PM

• Deskana moved this task from Needs Reporting to Resolved on the Discovery-Search (Current work) board.

Evaluate reducing shard counts for smaller wikisClosed, DeclinedPublicActions

Description

Details

Related Objects

Event Timeline

Evaluate reducing shard counts for smaller wikis
Closed, DeclinedPublic
Actions