Page MenuHomePhabricator

Tune HTTP availability alerts
Closed, ResolvedPublic

Description

The current HTTP availability alerts work and are timely, however sometimes too sensitive and result in false/spammy alerts for e.g. spikes.

Since global availability is fixed in T234567: global HTTP (un)availability number, as reported in Frontend Traffic dashboard, is bogus I'm suggesting we ditch the per-site availability and use global availability instead, if the signal-to-noise ratio proves good then we should turn the alert paging.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2019, 10:29 AM

Change 545802 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] monitoring: alert on reduced global http availability

https://gerrit.wikimedia.org/r/545802

Change 545802 merged by Filippo Giunchedi:
[operations/puppet@production] monitoring: alert on reduced global http availability

https://gerrit.wikimedia.org/r/545802

colewhite triaged this task as Medium priority.Oct 24 2019, 11:13 PM
fgiunchedi moved this task from Inbox to In progress on the observability board.Oct 28 2019, 2:17 PM
fgiunchedi closed this task as Resolved.Nov 25 2019, 1:41 PM
fgiunchedi claimed this task.

Thresholds adjusted for global availability and I've updated "frontend traffic" dashboard