once reqstats is in a better/more reliable state than sqstats.pl (i.e. T83580) we should get more signal out of reqstats metrics by e.g. look at ratio between errors and requests and thus have a percentage to alarm on, e.g.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | fgiunchedi | T98450 improve reqstats error alerts | |||
Resolved | Ottomata | T83580 Overhaul reqstats |
Event Timeline
Comment Actions
No longer valid, nowadays we monitor nginx/varnish availability through Prometheus, have setup alerts on low availability and have dashboards such as https://grafana.wikimedia.org/dashboard/db/frontend-traffic