The graph here: https://grafana-rw.wikimedia.org/d/000000566/overview?orgId=1&viewPanel=16&from=now-90d&to=now&forceLogin&editPanel=16
is meant to represent user-facing client-side errors, and warn the web team when problematic code has been introduced so our servers.
We have an established baseline that has been stable for over a year of less than 5k an hour errors and a recent. The low rate has made it extremely obvious when errors are introduced to group 0 and group 1 wikis.
However, it has become common that teams understandably want to log errors to debug issues. The Growth team in particular have been utilizing this.
A recent manually logged error has pushed the error rate up to 25k an hour. This makes it near impossible for the web team to notice errors as they roll out to group 1 wikis and before they roll out to group 2 wikis.
Needs
- We'd like to be able to filter out any errors that are not from the main channel.
mw.errorLogger.logError takes a topic as a parameter so this should be visible and filterable in logstash
- With the new field in place we'd need to update the logstash graph to only show errors in the main channel