During a recent incident it was noticed that memcached timeout errors do not seem to trigger alarm. Can we investigate and add metrics and alerts
Description
Description
Event Timeline
Comment Actions
This task is so sparse, and so much time has passed, that I'm not sure what the point is here.
Do we want to alert on the number of memcached errors from mediawiki? I will assume that's the case.
Comment Actions
We already added such an alert (porting it from check_prometheus) that is also by cluster, introduced in https://gerrit.wikimedia.org/r/c/operations/alerts/+/883950
I'll close this task for now.