Following the recent reindexing, we observed that Icinga has been throwing some false positives due to segment merges etc. We should reconfigure icinga to limit these false alerts etc.
Description
Details
Event Timeline
Checking icinga again, I can see something is wrong with the check time. It keeps checking every two minutes or so..
Change 464570 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] icinga::monitor::elasticsearch: throttle alerts notification for check_elasticsearch_shard_size
Change 464570 merged by Gehel:
[operations/puppet@production] icinga::monitor::elasticsearch: throttle alerts notifications
After watching the trend of this check for about a week now, I discovered that wikis like enwiki, wikidatawiki and cebwiki shards sizes usually grow beyond the warning threshold but never hit the critical threshold before some of them go back below the warning threshold.
The throttling was obviously a good idea, but I suggest we increase the warning and critical threshold. Currently, warning is 35gb while critical is 50gb. I suggest we make warning 50gb and critical 60gb. Such that if any index hit the warning threshold and stays there for a while (a week), then an inplace reindexing should immediately follow.
I think the proposal make sense. This check is here so that we don't forget to reshard when needed, but there isn't a hard limit on the max shard size (well, there is the overall disk space, but we're going to be in trouble well before that). The main goal being to get a low priority alert when things are climbing too high. And "too high" isn't well defined. So we have some latitude as to what limit we want to set.
The main point is that we should ensure that this check does not flap too much, and does not alert us too early.
In short: I think W=50GB and C=60Gb is fine.
Change 467322 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] elasticsearch: modify thresholds for icinga check shard size plugin
Change 467322 merged by Gehel:
[operations/puppet@production] elasticsearch: modify thresholds for icinga check shard size plugin