Page MenuHomePhabricator

Alert for Kafka MirrorMaker lag
Closed, ResolvedPublic8 Estimated Story Points

Description

We should know if MirrorMaker instances are lagging. We do track lag in graphite, so we could use icinga + graphite for this, or we could use Burrow.

Event Timeline

Change 422163 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install nrpe check for Kafka consumer lag by checking burrow

https://gerrit.wikimedia.org/r/422163

Change 422163 merged by Ottomata:
[operations/puppet@production] Install nrpe check for Kafka consumer lag by checking burrow

https://gerrit.wikimedia.org/r/422163

Change 422192 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add mirror_name and host as labels for mirror maker prometheus

https://gerrit.wikimedia.org/r/422192

Change 422192 merged by Ottomata:
[operations/puppet@production] Add mirror_name and host as labels for mirror maker prometheus

https://gerrit.wikimedia.org/r/422192

Change 422201 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Can't set labels on metric without name set

https://gerrit.wikimedia.org/r/422201

Change 422201 merged by Ottomata:
[operations/puppet@production] Can't set labels on metric without name set

https://gerrit.wikimedia.org/r/422201

Change 422230 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add check_prometheus alerts for Kafka MirrorMaker instances.

https://gerrit.wikimedia.org/r/422230

Change 422230 merged by Ottomata:
[operations/puppet@production] Add check_prometheus alerts for Kafka MirrorMaker instances.

https://gerrit.wikimedia.org/r/422230

Change 422251 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix path to check_kafka_consumer_lag nrpe check

https://gerrit.wikimedia.org/r/422251

Change 422251 merged by Ottomata:
[operations/puppet@production] Fix path to check_kafka_consumer_lag nrpe check

https://gerrit.wikimedia.org/r/422251

Change 422258 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix prometheus_url for mirror maker alert

https://gerrit.wikimedia.org/r/422258

Change 422258 merged by Ottomata:
[operations/puppet@production] Fix prometheus_url for mirror maker alert

https://gerrit.wikimedia.org/r/422258

Change 422335 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use scalar in dropped messages prometheus check for mirror maker

https://gerrit.wikimedia.org/r/422335

Change 422335 merged by Ottomata:
[operations/puppet@production] Use scalar in dropped messages prometheus check for mirror maker

https://gerrit.wikimedia.org/r/422335

Ottomata changed the point value for this task from 5 to 8.

Change 422424 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Alert on lag in last 30 minutes, alert mirror maker lag for analytics

https://gerrit.wikimedia.org/r/422424

Change 422424 merged by Ottomata:
[operations/puppet@production] Alert on lag in last 30 minutes, alert mirror maker lag for analytics

https://gerrit.wikimedia.org/r/422424

Change 422430 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Be more lenient about MirrorMaker numDroppedMessages alert

https://gerrit.wikimedia.org/r/422430

Change 422430 merged by Ottomata:
[operations/puppet@production] Be more lenient about MirrorMaker numDroppedMessages alert

https://gerrit.wikimedia.org/r/422430

Change 422467 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] check_kafka_consumer_log - STOP != alert, just bursty topics

https://gerrit.wikimedia.org/r/422467

Change 422467 merged by Ottomata:
[operations/puppet@production] check_kafka_consumer_log - STOP != alert, just bursty topics

https://gerrit.wikimedia.org/r/422467

Change 422939 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use promethues based alert rather than burrow lag check alert

https://gerrit.wikimedia.org/r/422939

Change 422939 merged by Ottomata:
[operations/puppet@production] Use promethues based alert rather than burrow lag check alert

https://gerrit.wikimedia.org/r/422939

Change 422945 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix consuer max lag check query

https://gerrit.wikimedia.org/r/422945

Change 422945 abandoned by Ottomata:
Fix consuer max lag check query

https://gerrit.wikimedia.org/r/422945