Background
We had a partial "outage" of […] causing at least the Main Page not to be displayed, and anon users getting PoolCounter error messages instead.
[There] were no signs that the poolcounter daemons weren't working correctly.
The poolcounter.log on fluorine was being spammed with lots of "Queue full" messages.
Context
In T83729#8695241, @Krinkle wrote:Translation from 2014 into 2023:
There were no alerts about poolcounter being degraded. "flourine" is a former host for the role now carried by mwlog1002. A proposed source of information to build an alert is the "poolcounter" Logstash channel where MediaWiki logs warnings/errors when it encounters an error.
The objective, however you choose to solve it, is for there to be an alert when MediaWiki is perceiving degraded service from PoolCounter.
Related:
- {T84143}
- T83656}