Page MenuHomePhabricator

improve reqstats error alerts
Closed, InvalidPublic

Description

once reqstats is in a better/more reliable state than sqstats.pl (i.e. T83580) we should get more signal out of reqstats metrics by e.g. look at ratio between errors and requests and thus have a percentage to alarm on, e.g.

http://graphite.wikimedia.org/render/?width=736&height=379&_salt=1430989607.95&from=-2hours&target=reqstats.5xx&target=reqstats.500&target=alias(secondYAxis(divideSeries(sumSeries(reqstats.5xx%2Creqstats.500)%2C%20reqstats.requests))%2C%22(5xx%20%2B%20500)%20%2F%20total%20requests%22)

Screen Shot 2015-05-07 at 11.07.07.png (375×733 px, 56 KB)

Related Objects

StatusSubtypeAssignedTask
Invalidfgiunchedi
ResolvedOttomata

Event Timeline

fgiunchedi claimed this task.
fgiunchedi raised the priority of this task from to Medium.
fgiunchedi updated the task description. (Show Details)
fgiunchedi subscribed.

No longer valid, nowadays we monitor nginx/varnish availability through Prometheus, have setup alerts on low availability and have dashboards such as https://grafana.wikimedia.org/dashboard/db/frontend-traffic