Make icinga monitoring more relevant
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	• chasemp
	Aug 14 2015, 6:37 PM

Description

From https://wikitech.wikimedia.org/wiki/Incident_documentation/20150615-Elasticsearch in part:

Icinga should detect gc death spirals

(P782)

Icinga should monitor the state of a node within the cluster itself and not just overall cluster health

(https://github.com/elastic/elasticsearch/issues/6801)

Icinga should probably alert on yellow cluster mode as well

Conclusions

    ES monitoring does not reflect properly the state of the cluster: it does not warn in yellow state, and general health monitoring was not enough to detect this particular case
    ES topology could be improved, as suggested by several people: things like master nodes not being data nodes, and maybe decoupling more wiki searches?
    Difficulty of testing ES java configurations, such as gc settings
    Ganglia tie-in for ES stats is error-prone and gets in the way during an outage

Related Objects
Search...

Status	Assigned	Task
Resolved	Gehel	T109089 EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade)
Duplicate	None	T109117 Make icinga monitoring more relevant
Declined	None	T133844 Improve Elasticsearch icinga alerting

Event Timeline

• chasemp created this task.Aug 14 2015, 6:37 PM

• chasemp raised the priority of this task from to Medium.

• chasemp updated the task description. (Show Details)

• chasemp added projects: Elasticsearch, Discovery-ARCHIVED.

• chasemp added subscribers: Krenair, • dcausse, Aklapper, • chasemp.

Krenair added projects: Icinga, observability.Aug 14 2015, 6:46 PM

• chasemp updated the task description. (Show Details)Aug 14 2015, 7:03 PM

• chasemp set Security to None.

• Deskana moved this task from Needs triage to Ops on the Discovery-ARCHIVED board.Feb 26 2016, 7:01 PM

Luke081515 updated the task description. (Show Details)Feb 26 2016, 7:10 PM

Gehel subscribed.Feb 29 2016, 3:35 PM

Gehel added a subtask: T133844: Improve Elasticsearch icinga alerting.Apr 28 2016, 1:28 PM

Restricted Application added a project: Discovery-Search. · View Herald TranscriptApr 28 2016, 1:28 PM

Gehel mentioned this in T124542: Setup icinga alerts for discovery services.Sep 22 2016, 2:06 PM

This seems mostly (but not entirely) a duplicate of T133844: Improve Elasticsearch icinga alerting; I'm going to merge these two tasks together. If someone disagrees, feel free to unmerge and specify how they're different. :-)

• Deskana closed this task as a duplicate of T133844: Improve Elasticsearch icinga alerting.Nov 3 2016, 10:23 PM

Gehel closed subtask T133844: Improve Elasticsearch icinga alerting as Declined.Sep 8 2020, 7:09 PM

Make icinga monitoring more relevantClosed, DuplicatePublicActions

Description

Related ObjectsSearch...

Event Timeline

Make icinga monitoring more relevant
Closed, DuplicatePublic
Actions

Related Objects
Search...