We should know if MirrorMaker instances are lagging. We do track lag in graphite, so we could use icinga + graphite for this, or we could use Burrow.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | elukey | T166833 Produce webrequests from varnishkafka to Kafka with Kafka message timestamp set to configurable content field | |||
Resolved | Ottomata | T152015 Provision new Kafka cluster(s) with security features | |||
Resolved | Lucas_Werkmeister_WMDE | T145712 Use RDF statement counts from entity data, not page props ( wikibase:identifiers, wikibase:statements and wikibase:sitelinks ) | |||
Resolved | Ottomata | T161731 Create reliable change stream for specific wiki | |||
Resolved | Ottomata | T183303 Decomission old analytics kafka cluster | |||
Resolved | Ottomata | T175461 Port Kafka clients to new jumbo cluster | |||
Resolved | Gehel | T189458 re-enable wdqs kafka poller | |||
Resolved | Ottomata | T189464 Fix Mirror Maker erratic behavior when replicating from main-eqiad to jumbo | |||
Resolved | Ottomata | T189611 Alert for Kafka MirrorMaker lag |
Event Timeline
Change 422163 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Install nrpe check for Kafka consumer lag by checking burrow
Change 422163 merged by Ottomata:
[operations/puppet@production] Install nrpe check for Kafka consumer lag by checking burrow
Change 422192 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add mirror_name and host as labels for mirror maker prometheus
Change 422192 merged by Ottomata:
[operations/puppet@production] Add mirror_name and host as labels for mirror maker prometheus
Change 422201 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Can't set labels on metric without name set
Change 422201 merged by Ottomata:
[operations/puppet@production] Can't set labels on metric without name set
Change 422230 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add check_prometheus alerts for Kafka MirrorMaker instances.
Change 422230 merged by Ottomata:
[operations/puppet@production] Add check_prometheus alerts for Kafka MirrorMaker instances.
Change 422251 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix path to check_kafka_consumer_lag nrpe check
Change 422251 merged by Ottomata:
[operations/puppet@production] Fix path to check_kafka_consumer_lag nrpe check
Change 422258 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix prometheus_url for mirror maker alert
Change 422258 merged by Ottomata:
[operations/puppet@production] Fix prometheus_url for mirror maker alert
Change 422335 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use scalar in dropped messages prometheus check for mirror maker
Change 422335 merged by Ottomata:
[operations/puppet@production] Use scalar in dropped messages prometheus check for mirror maker
lag alert:
MirrorMaker throughput and dropped messages alerts:
- https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=einsteinium&service=Kafka+MirrorMaker+main-eqiad_to_jumbo-eqiad+average+message+consume+rate+in+last+30m
- https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=einsteinium&service=Kafka+MirrorMaker+main-eqiad_to_jumbo-eqiad+average+message+produce+rate+in+last+30m
- https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=einsteinium&service=Kafka+MirrorMaker+main-eqiad_to_jumbo-eqiad+dropped+message+count+in+last+30m
Change 422424 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Alert on lag in last 30 minutes, alert mirror maker lag for analytics
Change 422424 merged by Ottomata:
[operations/puppet@production] Alert on lag in last 30 minutes, alert mirror maker lag for analytics
Change 422430 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Be more lenient about MirrorMaker numDroppedMessages alert
Change 422430 merged by Ottomata:
[operations/puppet@production] Be more lenient about MirrorMaker numDroppedMessages alert
Change 422467 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] check_kafka_consumer_log - STOP != alert, just bursty topics
Change 422467 merged by Ottomata:
[operations/puppet@production] check_kafka_consumer_log - STOP != alert, just bursty topics
Change 422939 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use promethues based alert rather than burrow lag check alert
Change 422939 merged by Ottomata:
[operations/puppet@production] Use promethues based alert rather than burrow lag check alert
Change 422945 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix consuer max lag check query