Following https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles-prometheus?panelId=64&fullscreen&orgId=1&from=now-30d&to=now&var-cluster=eqiad&var-smoothing=1&var-exported_cluster=search&edit, we discovered a shard was unassigned since 02/12/2018.
We should have icinga alert us if there is any case as such.
Querying /_cluster/allocation/explain should give us what we need. This check should happen at low frequency. I'm proposing:
Freq: Daily or 24h
retry: 1h
tries: 3