Page MenuHomePhabricator

Alerts for common/important EventStreams topic volume
Closed, ResolvedPublic5 Estimated Story Points

Description

We just had an incident where a config change caused EventStreams to not see any events for about 24 hours. We should monitor and alert on at least some of the message volume for topics we expose in EventStreams, perhaps revision-create and recentchange would be enough.

See: https://wikitech.wikimedia.org/w/index.php?title=Incident_documentation/20170829-EventStreams

Event Timeline

Ottomata renamed this task from Alerts for common/import EventStreams topic volume to Alerts for common/important EventStreams topic volume.Aug 29 2017, 8:32 PM
Nuria triaged this task as High priority.Aug 31 2017, 4:06 PM
Nuria edited projects, added Analytics-Kanban; removed Analytics.
Nuria set the point value for this task to 5.
Nuria added a subscriber: Nuria.

It is probably easiest to have alerts for volume starting with RCStream

fdans lowered the priority of this task from High to Low.Mar 26 2018, 4:50 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.

Change 433161 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Alert if EventStreams recentchange endpoint has no messages

https://gerrit.wikimedia.org/r/433161

Change 433161 merged by Ottomata:
[operations/puppet@production] Alert if EventStreams recentchange endpoint has no messages

https://gerrit.wikimedia.org/r/433161

Change 433166 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix type in check_eventstreams script

https://gerrit.wikimedia.org/r/433166

Change 433166 merged by Ottomata:
[operations/puppet@production] Fix type in check_eventstreams script

https://gerrit.wikimedia.org/r/433166

Change 433194 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Use fqdn instead of localhost for curl eventstreams check

https://gerrit.wikimedia.org/r/433194

Change 433194 merged by Ottomata:
[operations/puppet@production] Use proper path and fqdn for eventstreams check

https://gerrit.wikimedia.org/r/433194