Maniphest T203546

Alert when elasticsearch has shards larger than a maximum size
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Gehel
	Sep 5 2018, 8:18 AM

Description

We have a documented rule that shards on the cirrus cluster should be 30GB max. When shards start to grow over this limit, relocation of shards becomes complicated and we should increase the number of shards for this index. We have been surprised a few times by shards growing up to 70GB.

An icinga check, running at low frequency (once per day is enough) would help identify those shards in a timely fashion. This check should have warning and critical threshold. It should report the indices that have shards over limit.

Details

	Subject	Repo	Branch	Lines +/-
	Elasticsearch shard size check	operations/puppet	production	+138 -3

Customize query in gerrit

Event Timeline

Gehel created this task.Sep 5 2018, 8:18 AM

Restricted Application edited projects, added Discovery-Search; removed Discovery-Search (Current work). · View Herald TranscriptSep 5 2018, 8:18 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Gehel triaged this task as Medium priority.Sep 5 2018, 8:18 AM

For reference, https://github.com/wikimedia/puppet/blob/production/modules/elasticsearch/files/nagios/check_elasticsearch.py is a similar check, which could be used as a base for this new check.

• Mathew.onipe claimed this task.Sep 6 2018, 9:59 AM

• Mathew.onipe edited projects, added Discovery-Search (Current work); removed Discovery-Search.Sep 6 2018, 10:10 AM

• Mathew.onipe moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

Change 458891 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] elasticsearch shard size check * Checks shard size and sends alert if more than 30gb.

https://gerrit.wikimedia.org/r/458891

gerritbot added a project: Patch-For-Review.Sep 10 2018, 3:56 AM

• Mathew.onipe added a subscriber: fgiunchedi.Sep 10 2018, 8:22 AM

Output of testing the shard size check script on relforge:

onimisionipe@relforge1001:~/tests$ python3 shard_el.py --shard-size-warning 25 --shard-size-critical 40
CRITICAL - stas_wikidata_test:6 (size=49gb), stas_wikidata_test:5 (size=49gb), stas_wikidata_test:4 (size=49gb), stas_wikidata_test:3 (size=49gb), stas_wikidata_test:2 (size=49gb), stas_wikidata_test:1 (size=49gb), stas_wikidata_test:0 (size=49gb), commons_image_quality:14 (size=30gb), commons_image_quality:11 (size=30gb), commons_image_quality:10 (size=30gb), commons_image_quality:8 (size=29gb), commons_image_quality:2 (size=29gb), commons_image_quality:7 (size=27gb), commons_image_quality:6 (size=27gb), commons_image_quality:3 (size=27gb), commons_image_quality:0 (size=27gb), commons_image_quality:5 (size=26gb)

Change 458891 merged by Gehel:
[operations/puppet@production] Elasticsearch shard size check

https://gerrit.wikimedia.org/r/458891

• Mathew.onipe moved this task from not in use - please delete to Needs Reporting on the Discovery-Search (Current work) board.Sep 12 2018, 7:00 PM

debt closed this task as Resolved.Sep 13 2018, 9:15 PM

Alert when elasticsearch has shards larger than a maximum sizeClosed, ResolvedPublicActions

Description

Details

Event Timeline

Alert when elasticsearch has shards larger than a maximum size
Closed, ResolvedPublic
Actions