Page MenuHomePhabricator

Icinga alert for EventGate validation errors
Closed, ResolvedPublic3 Estimated Story Points

Description

In T225110, we recently noticed mediawiki.cirrussearch-request validation errors. It would have been nice to get an alert for these earlier rather than just randomly noticing them.

For eventgate-main events, this will be easier, as we report the HTTP errors when validation fails, as the HTTP response indicates this status. We use 'hasty' mode for eventgate-analytics, so we'll need some other measure of errors.

Event Timeline

fdans moved this task from Incoming to Event Platform on the Analytics board.

Change 514871 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add monitoring::alerts::kafka_topic_throughput and use it for eventgate validation alerts

https://gerrit.wikimedia.org/r/514871

Ottomata set the point value for this task to 3.

Change 514871 merged by Ottomata:
[operations/puppet@production] Add monitoring::alerts::kafka_topic_throughput and use it for eventgate

https://gerrit.wikimedia.org/r/514871

Ottomata renamed this task from Icinga alert for EventGate produce errors, validation or otherwise to Icinga alert for EventGate validation errors.Jun 10 2019, 6:21 PM
Ottomata updated the task description. (Show Details)

How to modify EventGate to emit these wasn't obvious, so I changed this task to be for validation errors only. We configure a validation error topic, so we can alert on the throughput of those topics.

Change 516324 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use method gt instead of ge for eventgate validation error throughput alerts

https://gerrit.wikimedia.org/r/516324

Change 516324 merged by Ottomata:
[operations/puppet@production] Use method gt instead of ge for eventgate validation error throughput alerts

https://gerrit.wikimedia.org/r/516324