Page MenuHomePhabricator

Create Icinga alert when OSM replication lags on maps
Closed, ResolvedPublic

Description

We have graphs on OSM replication lag (see T160011). We should also act when those graphs peak over a certain limit. This can probably be easily done with a graphite_threshold alert.d

Event Timeline

Gehel created this task.Jun 9 2017, 9:13 PM
Restricted Application added a project: Discovery. · View Herald TranscriptJun 9 2017, 9:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel updated the task description. (Show Details)Jun 9 2017, 9:15 PM
debt triaged this task as High priority.Jun 9 2017, 9:44 PM
debt moved this task from Backlog to To-do on the Maps-Sprint board.

The current issues need to be fixed before we can activate any alert.

Gehel moved this task from To-do to Stalled/Waiting on the Maps-Sprint board.Jun 28 2017, 8:20 AM
debt assigned this task to Gehel.Aug 3 2017, 7:11 PM
debt moved this task from Stalled/Waiting to To-do on the Maps-Sprint board.
Gehel moved this task from To-do to In progress on the Maps-Sprint board.Feb 13 2018, 4:09 PM
Gehel moved this task from In progress to Needs review on the Maps-Sprint board.
Gehel added a comment.Feb 22 2018, 6:24 PM

I need to check, but I think the metric needed for OSM replication lag are still good. I'll just need to cherry pick this patch and deploy it.

Change 410172 merged by Gehel:
[operations/puppet@production] maps: Icinga alert when OSM replication lags

https://gerrit.wikimedia.org/r/410172

Gehel moved this task from Needs review to Done on the Maps-Sprint board.Feb 26 2018, 9:45 AM

Those alerts are now available on Icinga and passing. I'll keep an eye on them for the next few days to make sure we don't have false positives, but that should be all good.

Gehel closed this task as Resolved.Jun 12 2018, 4:05 PM