Page MenuHomePhabricator

Misconfigured proxies on I/F hosts
Closed, ResolvedPublic

Description

I added a filter term (disabled by default) on the Squid proxy dashboard to surface traffic to any *.wikimedia.org or *.wikipedia.org domain:
https://logstash.wikimedia.org/app/dashboards#/view/58c908a0-a394-11ec-bf8e-43f1807d5bc2

Any queries to those two domains (and more) doesn't need to (and shouldn't) go through the Squid proxies are they're internal hosts. See doc on https://wikitech.wikimedia.org/wiki/HTTP_proxy#How-to?

Longer term plans might be to block such traffic flows to prevent configuration mistake at their creation.

Here are the largest offending hosts relevant to Infrastructure Foundations in the last 24h:

build2001.codfw.wmnet.

Various UA:
Debian APT-HTTP/1.3 (1.8.2.3) 1,397
Debian APT-HTTP/1.3 (2.2.4) 526
Debian APT-HTTP/1.3 (1.4.11) 291
Debian APT-HTTP/1.3 (1.8.2.1) 113
Debian APT-HTTP/1.3 (1.8.2.2) 28

Two destinations:
mirrors.wikimedia.org 2,070
apt.wikimedia.org 285

After a chat with @MoritzMuehlenhoff they are from the pbuilder environments
It's not clear how they learn the proxy config, but we should be able to configure the DIRECT keyword for those two domains. See more information about DIRECT in https://www.claudiokuenzler.com/blog/619/apt-behind-proxy-no-proxy-for-some-repositories

Similarly, deploy1002.eqiad.wmnet have calls to the same two destinations.


Lat pattern, the host not relevant as they're provisioning hosts, but UA set to "debian-installer"
Fetching data from:
mirrors.wikimedia.org 162
apt.wikimedia.org 2

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedayounsi

Event Timeline

Aklapper renamed this task from Missconfigured proxies on I/F hosts to Misconfigured proxies on I/F hosts.Jan 8 2023, 7:00 PM

Change 878063 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] docker::baseimages: inject no_proxy config to rebuild job

https://gerrit.wikimedia.org/r/878063

After a chat with @MoritzMuehlenhoff they are from the pbuilder environments

I took a look at this and the pbuilder environments already go direct however there is a systemd timer debian-weekly-rebuild.service that seems to be responsible for this traffic, CR sent.

Similarly, deploy1002.eqiad.wmnet have calls to the same two destinations.

I didn't see the apt domains for this one but i did see

  • helm-charts.wikimedia.org
  • gerrit.wikimedia.org

Both of theses look like manual user actions possibly would be fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/771568/ been rolled out globally

Both of theses look like manual user actions possibly would be fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/771568/ been rolled out globally

Only seeing that set of patches now, I left a comment on:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/771411/8#message-232fee2f2106f145e77d72a41219ced4f1b8d120

I don't see any red-flag on rolling it out globally, if there are no blockers, let's draft an email for sre-at-large@ to plan a global roll out?

Change 878063 merged by Jbond:

[operations/puppet@production] docker::baseimages: inject no_proxy config to rebuild job

https://gerrit.wikimedia.org/r/878063

Change 878884 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] environment: add no_proxy config directly to environment

https://gerrit.wikimedia.org/r/878884

Both of theses look like manual user actions possibly would be fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/771568/ been rolled out globally

Only seeing that set of patches now, I left a comment on:
https://gerrit.wikimedia.org/r/c/operations/puppet/+/771411/8#message-232fee2f2106f145e77d72a41219ced4f1b8d120

I don't see any red-flag on rolling it out globally, if there are no blockers, let's draft an email for sre-at-large@ to plan a global roll out?

i have created a new CR once that is reviewed by you and Moritz ill send an email and plan to roll out globally

Change 878884 merged by Jbond:

[operations/puppet@production] environment: add no_proxy config directly to environment

https://gerrit.wikimedia.org/r/878884

ayounsi claimed this task.

Seems fixed now.