I added a filter term (disabled by default) on the Squid proxy dashboard to surface traffic to any *.wikimedia.org or *.wikipedia.org domain:
https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2
Any queries to those two domains (and more) doesn't need to (and shouldn't) go through the Squid proxies are they're internal hosts. See doc on https://wikitech.wikimedia.org/wiki/HTTP_proxy#How-to?
Longer term plans might be to block such traffic flows to prevent configuration mistake at their creation.
Here are the largest offending hosts relevant to data engineering in the last 24h:
stat1008.eqiad.wmnet (121,747 hits in 24h)
Top 5 destinations (no UA):
wikimedia.org 82,442
fr.wikipedia.org 4,253
en.wikipedia.org 4,011
nl.wikipedia.org 3,136
es.wikipedia.org 1,639
Slight digression but it would be useful to add a user agent to such queries.
Then UA "git/2.20.1": gerrit/gitlab.wikimedia.org (142)
stat1007.eqiad.wmnet (1,763)
Top 5 destinations:
noc.wikimedia.org 1,478
gerrit.wikimedia.org 280
lists.wikimedia.org 2
gitlab.wikimedia.org 1
meta.wikimedia.org 1
With UA "WMDE Wikidata metrics gathering " and "git/2.20.1"
an-worker1146.eqiad.wmnet (366)
Top 5 destinations:
ab.wikipedia.org 1
ace.wikipedia.org 1
ady.wikipedia.org 1
af.wikipedia.org 1
ak.wikipedia.org 1
And many more similar subdomains
stat1005.eqiad.wmnet
gerrit.wikimedia.org 130 (UA "git/2.20.1")
dumps.wikimedia.org 42
an-launcher1002.eqiad.wmnet.
gerrit.wikimedia.org 168 (UA "git/2.20.1")
stat1004.eqiad.wmnet.
gerrit.wikimedia.org 138 (UA "git/2.20.1")
The longer tail have similar git traffic patterns for stat1006.eqiad.wmnet, an-web1001.eqiad.wmnet, an-coord1001.eqiad.wmnet.