Page MenuHomePhabricator

improve reqstats error alerts
Closed, InvalidPublic

Description

once reqstats is in a better/more reliable state than sqstats.pl (i.e. T83580) we should get more signal out of reqstats metrics by e.g. look at ratio between errors and requests and thus have a percentage to alarm on, e.g.

http://graphite.wikimedia.org/render/?width=736&height=379&_salt=1430989607.95&from=-2hours&target=reqstats.5xx&target=reqstats.500&target=alias(secondYAxis(divideSeries(sumSeries(reqstats.5xx%2Creqstats.500)%2C%20reqstats.requests))%2C%22(5xx%20%2B%20500)%20%2F%20total%20requests%22)

Related Objects

StatusSubtypeAssignedTask
Invalidfgiunchedi
ResolvedOttomata

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to Medium.
fgiunchedi updated the task description. (Show Details)
fgiunchedi added a subscriber: fgiunchedi.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 7 2015, 9:08 AM
fgiunchedi closed this task as Invalid.Sep 21 2018, 2:18 PM

No longer valid, nowadays we monitor nginx/varnish availability through Prometheus, have setup alerts on low availability and have dashboards such as https://grafana.wikimedia.org/dashboard/db/frontend-traffic