Page MenuHomePhabricator

Varnish traffic drop alert @ codfw is noisy / codfw incoming traffic is spikey
Closed, ResolvedPublic


For the past week or so, the Varnish traffic drop alert for specifically codfw has been noisy:

This does seem to correlate with some odd minute-to-minute spikiness happening to codfw's traffic flow which perhaps should be investigated as well

One of the things I think we should do is to add an absolute minimum traffic level required to alert, since a simple ratio will always be subject to this kind of noise. Here's a plot of one way we could express that in PromQL:

(We might also want to make the traffic drop alerts based off of ATS metrics and not Varnish frontend ones?)

Event Timeline

CDanis created this task.Nov 25 2019, 1:07 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 25 2019, 1:07 AM
CDanis updated the task description. (Show Details)Nov 25 2019, 1:09 AM

Took a quick look at the expression and the idea LGTM, thanks @CDanis. Also cc @ayounsi as the original implementor of the alert

CDanis moved this task from Inbox to In progress on the observability board.Nov 25 2019, 4:07 PM
jbond triaged this task as Medium priority.Nov 26 2019, 11:51 AM

Mentioned in SAL (#wikimedia-operations) [2019-11-27T08:24:07Z] <godog> silence codfw varnish traffic drop until dec 9th - T239039

Change 555550 had a related patch set uploaded (by CDanis; owner: CDanis):
[operations/puppet@production] traffic drop: require minimum absolute rps

Change 555550 merged by CDanis:
[operations/puppet@production] traffic drop: require minimum absolute rps

CDanis closed this task as Resolved.Dec 9 2019, 7:16 PM
CDanis claimed this task.

Looking at some data in grafana explore, this would have solved most cases of noise in the past few months. So calling it resolved for now.