The Toolforge HTTP ingress setup had some problems today.
Alerts flapping for haproxy backends going down, ex:
The trouble started today at ~1:00 am UTC:
I was able to reproduce the issue by just refreshing any tool page, or doing a manual curl like:
root@tools-k8s-haproxy-5:~# time curl -v http://tools-k8s-ingress-9.tools.eqiad1.wikimedia.cloud:30002
And it timing out sometimes (but responding quick others).
There did not seem to be any issues with memory or cpu limits.
I restarted the pods from the ingress-nginx-gen2 deployment and things improved, I can't manually reproduce anymore and haproxy retries and errors went down:
from https://grafana.wmcloud.org/d/toolforge-k8s-haproxy/toolforge-k8s-haproxy?orgId=1&from=now-24h&to=now&var-host=tools-k8s-haproxy-6&var-backend=All&var-frontend=All&var-server=All&var-code=All&var-interval=30s&refresh=5m
https://usercontent.irccloud-cdn.com/file/5pNj6dqz/image.png
https://usercontent.irccloud-cdn.com/file/5QJjU7QI/image.png
But alerts are still flapping.


