Page MenuHomePhabricator

global http_proxy setting
Closed, ResolvedPublic

Description

In the admin module we currently have a bunch of users who set some form of the following in the login profile http_proxy-http://webproxy:8080. The most complete of which comes from alex (updated slightly to make use of the dns search path)

function set_proxy {
        export HTTP_PROXY=http://webproxy:8080
        export HTTPS_PROXY=http://webproxy:8080
        export http_proxy=http://webproxy:8080
        export https_proxy=http://webproxy:8080
        export NO_PROXY=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org
        export no_proxy=127.0.0.1,::1,localhost,.wmnet,.wikimedia.org,.wikipedia.org
}

function clear_proxy {
        unset HTTP_PROXY
        unset HTTPS_PROXY
        unset http_proxy
        unset https_proxy
        unset NO_PROXY
        unset no_proxy
}

This ticket is to discuss if we should have this capability globally or at least every all locations required to use the proxy and if we do how should that look.

The first question to ask is What are the use cases for using the proxy manually on a production host? It possible we can fix theses use cases else where

however if we decide theses are useful for production:

do we need to add theses settings as a function so users explicitly set/unset the parameters (as above) or can we just set theses values in explicitly in /etc/profile.d/ and /etc/zsh/zshenv

If we think we should have theses as functions would it be make senses to automaticity call set_proxy on an interactive login?

Looking at this from a different side perhaps we should just add the config to the tools that most often need theses settings e.g.

/etc/wgetrc
http_proxy = http://webproxy:8080
https_proxy = http://webproxy:8080

I don't think curl has this option global so we would probably have to set up some global alias (which dosn't sound good)

finally i have noticed that some users add wikimedia.org to the list of no_proxy domains and some don't. Both methods work however depending on expectations one may be preferred over the other. There are likley also some edge cases for instance including wikimedia.org in the no_proxy list will me that some of the externally hosts sites e.g. https://wikitech-static.wikimedia.org wouldn't be available. however the amount of theses sites is i believe quite small and the benefit of having a direct connection may be preferred in this case?

Event Timeline

jbond triaged this task as Medium priority.Mar 24 2021, 1:18 PM
jbond created this task.

Let me play the devil's advocate here, I hope to not have misunderstood your intentions, correct me if I'm wrong.

What are the use cases for using the proxy manually on a production host?
Isn't the whole purpose of having internet access blocked by default exactly to prevent the habit of downloading untrusted files from the internet on production hosts, manually and outside of our configuration management system?

I totally see that this might be needed in some cases, but IMHO those cases should come up rarely and be treated as exceptions and we should try to fix any other common use case that might exist.
Having the proxy settings already set by default all the time seems to go in the wrong direction.

AFAIK we currently don't even have any proper auditing at the proxy layer, so setting the proxies as a default for all users upon login seems even dangerous to me.
I like the fact that if I try to download something from the internet on a host it actually fails. That should make me think that I might be trying to do something the wrong way.

I'm in favour of not having the env have internet access by default, for safety reasons.

I'm one of the users that has a similar bash function defined for this purpose. The usecase is that sometimes i need to install something like wmfmariadbpy + its dependencies on a prod host (usually cumin1001) to test something. This requires using pip, which then requires net access.

I have it as a shortcut function on my home dir and not set somewhere globally and easily accessible by everyone for more or less the same reason, that is to not encourage it's ab(use). If everyone resolves to something similar, we can go around and perhaps make life a tad easier for newcomers by putting it in /etc/profile.d/ but it should always remain a conscious choice to use it IMHO.

FWIW, users of analytics would love this, but I understand the reasons not to enable it automatically.

I also have a shell alias (proxy-on / proxy-off) for convenience and use it in a few cases when building packages require internet. +1 to have a shared alias available (since in practice we already have that, just sprinkled in a few places) (and -1 to have proxy enabled by default, for reasons already mentioned)

thanks all

What are the use cases for using the proxy manually on a production host?

I have added this to the description

Isn't the whole purpose of having internet access blocked by default

Im unsure what the motivations where here however been picky...

I'm in favour of not having the env have internet access by default

I would argue its not blocked by default and im not even sure we could argue obfuscation considering https://wikitech.wikimedia.org/wiki/HTTP_proxy and grep -iE 'https?_proxy' /home/*/.bashrc. normally using a proxy is more about audit and finer grade filtering capabilities as oppose to completely blocking access. however

AFAIK we currently don't even have any proper auditing at the proxy layer,

We should probably fix this

not set somewhere globally and easily accessible by everyone

i feel that's already available with https://wikitech.wikimedia.org/wiki/HTTP_proxy

and -1 to have proxy enabled by default, for reasons already mentioned)
I like the fact that if I try to download something from the internet on a host it actually fails.
but it should always remain a conscious choice to use it IMHO.

Ack this makes senses to me and if agreed would probably just but something similar to functions in the description in /etc/profile.d/proxy.sh

jbond updated the task description. (Show Details)

Thanks for looking into that @jbond!
As we now have better auditability, and an ongoing conversation about Squid ACLs (T300977) this is something we should look into.

Right now we don't seem to have much coherence between hosts, and I agree the proxies (and internal hosts ) are supposed to be an extra layer of defense.

(and let's not discuss a NAT firewall as this would bring its own can of worms, complicating the existing setup).

Public hosts
Those don't need to use the proxies as they have direct internet access (FERM rules permitting).
The case could be made that it would be beneficial to bring global consistency (HTTP across the infra) and auditability.
However, the downsides are too high: significant change, additional service dependency, local inconsistency (HTTP vs. other protocols)

Private hosts
Those don't have the choice, they need the proxies to reach external resources (over HTTP).
Here I tend to agree that proxies should not be configured by default for the reasons mentioned in previous comments. This could potentially change in the future, especially if they are tied to a whitelist.
That said, if a user makes the conscious choice to use the proxies, we should make it easy (and consistent) for them to use it.
That's where I like @jbond's idea to have script available for users to run (especially if less familiar with our infra, eg. analytics people). This should go hand in hand with an updated/cleaned up documentation.

In addition, and if possible, we should push an authoritative no_proxy options across all the hosts, for consistency, that would only be used when http_proxy is set.
Looking at the dashboard, a lot of proxy traffic is for destinations that don't need proxy access. If that's possible it should be carefully rolled out to analytics hosts only when the analytics ACLs are opened up (T298087), as the proxies are involuntarily a way to workaround those ACLs.

Change 771411 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] P:environment: Add no_proxy values to the default environment

https://gerrit.wikimedia.org/r/771411

Change 771568 had a related patch set uploaded (by Jbond; author: jbond):

[operations/puppet@production] P:environment: Add support for environment.d to zsh and bash

https://gerrit.wikimedia.org/r/771568

Change 771568 merged by Jbond:

[operations/puppet@production] P:environment: Add support for environment.d to zsh and bash

https://gerrit.wikimedia.org/r/771568

Change 771411 merged by Jbond:

[operations/puppet@production] P:environment: Add ablilty to inject environment variables

https://gerrit.wikimedia.org/r/771411

Change 878884 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] environment: add no_proxy config directly to environment

https://gerrit.wikimedia.org/r/878884

Change 878884 merged by Jbond:

[operations/puppet@production] environment: add no_proxy config directly to environment

https://gerrit.wikimedia.org/r/878884

Change 968716 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Add helper bash function to setup proxy env var

https://gerrit.wikimedia.org/r/968716

Going though this task after discussing proxies in another group.

The side of this task about forcing sane no_proxy settings is done thanks to the few patches from @jbond above.

The remaining side, probably just "nice to have" is to add a helper script users can use to quickly and correctly set the proxies in their session. A bit like what's described in the task description. Hopefully https://gerrit.wikimedia.org/r/968716 does it well.

Change 968716 merged by Ayounsi:

[operations/puppet@production] Add helper functions to setup proxy env var

https://gerrit.wikimedia.org/r/968716

ayounsi claimed this task.

Updated the doc https://wikitech.wikimedia.org/wiki/HTTP_proxy
I think everything here is done. Please re-open if not.