Scap monitors Logstash error rates from canary servers during a deployment. However, problems are not always triggered by a deployment. They may be triggered by an external factor, or a cron job, or may only reveal themselves after a certain cache is purged or expired, etc.
As such, we should have an Icinga alert (Based on Graphite, Prometheus or Grafana?) that triggers when the WARNING or ERROR rate of mediawiki logs increases above a certain threshold for a prolonged period of time.
This would be similar for the alerts that we have already for MediaWiki exceptions.
This is actionable from https://wikitech.wikimedia.org/wiki/Incident_documentation/20180710-MediaWiki.