From alertmanager (https://prometheus-alerts.wmcloud.org):
alertname: ToolsGridQueueProblem summary: Grid queue webgrid-lighttpd@tools-sgeweblight-10-20.tools.eqiad1.wikimedia.cloud is in state E 9 hours ago host: tools-sgeweblight-10-20.tools.eqiad1.wikimedia.cloud instance: tools-sgegrid-master job: node queue: webgrid-lighttpd severity: warn state: E @receiver: cloud-admin-feed runbook