Originally guessed in T245121, more visibility has been added with T245176.
I created a quick dashboard: https://grafana.wikimedia.org/d/i5YA-BXWz/squid?orgId=1
But it looks like the Prometheus exporter in eqiad is often taking a lot of time to reply (eg 90s).
ayounsi@prometheus1003:~$ time curl install1003.wikimedia.org:9301/metrics -s | grep _up # HELP squid_up Was the last query of squid successful? # TYPE squid_up gauge squid_up{host="localhost"} 1 real 1m34.694s user 0m0.012s sys 0m0.008s
The same issue doesn't happen in codfw.
The amount of requests is quite small ~8rps. So I'd think there is a miss-configuration somewhere?