Page MenuHomePhabricator

Misconfigured proxies on data-engineering hosts
Open, LowPublic

Description

I added a filter term (disabled by default) on the Squid proxy dashboard to surface traffic to any *.wikimedia.org or *.wikipedia.org domain:
https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2

Any queries to those two domains (and more) doesn't need to (and shouldn't) go through the Squid proxies are they're internal hosts. See doc on https://wikitech.wikimedia.org/wiki/HTTP_proxy#How-to?

Longer term plans might be to block such traffic flows to prevent configuration mistake at their creation.

Here are the largest offending hosts relevant to data engineering in the last 24h:

stat1008.eqiad.wmnet (121,747 hits in 24h)
Top 5 destinations (no UA):
wikimedia.org 82,442
fr.wikipedia.org 4,253
en.wikipedia.org 4,011
nl.wikipedia.org 3,136
es.wikipedia.org 1,639

Slight digression but it would be useful to add a user agent to such queries.

Then UA "git/2.20.1": gerrit/gitlab.wikimedia.org (142)

stat1007.eqiad.wmnet (1,763)
Top 5 destinations:
noc.wikimedia.org 1,478
gerrit.wikimedia.org 280
lists.wikimedia.org 2
gitlab.wikimedia.org 1
meta.wikimedia.org 1
With UA "WMDE Wikidata metrics gathering " and "git/2.20.1"

an-worker1146.eqiad.wmnet (366)
Top 5 destinations:
ab.wikipedia.org 1
ace.wikipedia.org 1
ady.wikipedia.org 1
af.wikipedia.org 1
ak.wikipedia.org 1
And many more similar subdomains

stat1005.eqiad.wmnet
gerrit.wikimedia.org 130 (UA "git/2.20.1")
dumps.wikimedia.org 42

an-launcher1002.eqiad.wmnet.
gerrit.wikimedia.org 168 (UA "git/2.20.1")

stat1004.eqiad.wmnet.
gerrit.wikimedia.org 138 (UA "git/2.20.1")

The longer tail have similar git traffic patterns for stat1006.eqiad.wmnet, an-web1001.eqiad.wmnet, an-coord1001.eqiad.wmnet.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone

Event Timeline

RKemper renamed this task from Missconfigured proxies on data-engineering hosts to Misconfigured proxies on data-engineering hosts.Oct 2 2023, 6:47 PM
RKemper updated the task description. (Show Details)
Gehel triaged this task as Low priority.Oct 11 2023, 8:55 AM
Gehel moved this task from Incoming to Misc on the Data-Platform-SRE board.