Page MenuHomePhabricator

Elasticsearch health check for shards icinga check shows OK status when cluster health is yellow
Closed, ResolvedPublic

Description

While rebooting logstash elasticsearch hosts I noticed that icinga shows OK while the cluster state is yellow. For example:

OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 6, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 86, task_max_waiting_in_queue_millis: 0, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards_percent_as_number: 99.5726495726, active_shards: 233, initializing_shards: 1, number_of_data_nodes: 3, delayed_unassigned_shards: 0

Opening a task to review this and think about how best to alert on yellow cluster status without introducing excessive alert spam.

Event Timeline

herron triaged this task as Normal priority.Nov 28 2018, 9:10 PM
herron created this task.
Restricted Application added a project: Discovery-Search. · View Herald TranscriptNov 28 2018, 9:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

the problem is yellow cluster status is part of normal operations. For example when a new index is created the cluster first goes to yellow as only the primary exists, then the replicas get allocated and the cluster comes back to green.

I suppose you could alert if it's been in yellow for X minutes?

Gehel closed this task as Resolved.Jan 29 2019, 7:26 PM
Gehel claimed this task.
Gehel added a subscriber: Gehel.

We have specific checks for things we actually care about. Having the cluster in yellow state is part of normal operations and should not alert.