Page MenuHomePhabricator

Create Icinga alert when OSM replication lags on maps
Closed, ResolvedPublic

Description

We have graphs on OSM replication lag (see T160011). We should also act when those graphs peak over a certain limit. This can probably be easily done with a graphite_threshold alert.d

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
debt triaged this task as High priority.Jun 9 2017, 9:44 PM
debt moved this task from Backlog to To-do on the Maps-Sprint board.

The current issues need to be fixed before we can activate any alert.

debt moved this task from Stalled/Waiting to To-do on the Maps-Sprint board.
Gehel moved this task from In progress to Needs review on the Maps-Sprint board.

I need to check, but I think the metric needed for OSM replication lag are still good. I'll just need to cherry pick this patch and deploy it.

Change 410172 merged by Gehel:
[operations/puppet@production] maps: Icinga alert when OSM replication lags

https://gerrit.wikimedia.org/r/410172

Those alerts are now available on Icinga and passing. I'll keep an eye on them for the next few days to make sure we don't have false positives, but that should be all good.