Page MenuHomePhabricator

Add alert for overall influx of Logstash
Closed, ResolvedPublic

Description

We already have alerts for some specific channels, including some of which are integrated into the deployment process (such as for MediaWiki).

However, there are cases where the added load is caused by a single change, but its impact only found in other services, or multiple other services.

An alert for major changes to the overall insert rate would help catch these early on.

Follows-up:

Event Timeline

I've updated https://grafana.wikimedia.org/dashboard/db/logstash to include yesterday vs today input rate comparison, perhaps there's signal in there

Change 455576 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] prometheus: alert on unusual day-over-day logstash ingestion rate change

https://gerrit.wikimedia.org/r/455576

Change 455576 merged by Filippo Giunchedi:
[operations/puppet@production] prometheus: alert on unusual day-over-day logstash ingestion rate change

https://gerrit.wikimedia.org/r/455576

fgiunchedi claimed this task.

Resolving for now as the alarm is in place, will reopen if needed

Change 884349 had a related patch set uploaded (by Herron; author: Herron):

[operations/alerts@master] logstash: remove rate of ingestion percent change compared to yesterday alert

https://gerrit.wikimedia.org/r/884349

Change 884349 merged by jenkins-bot:

[operations/alerts@master] logstash: remove rate of ingestion percent change compared to yesterday alert

https://gerrit.wikimedia.org/r/884349