Page MenuHomePhabricator

Allow the production cluster to access *.wmflabs.org IPs
Closed, ResolvedPublic

Description

Currently, we have a use case in T78167 to allow production cluster to talk to the labs cluster, like it could talk to the rest of InterNet: to be able to upload files by URL.

Could we investigate about the feasibility of allowing this network exchange?

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
DeclinedNone
Resolvedakosiaris

Event Timeline

Dereckson raised the priority of this task from to Medium.
Dereckson updated the task description. (Show Details)
Dereckson added a project: acl*sre-team.
Dereckson updated the task description. (Show Details)
Dereckson set Security to None.
Dereckson added a project: Security-General.
Dereckson added subscribers: Liuxinyu970226, dschwen, Legoktm and 9 others.

@Dereckson: Why was this added to Security-General (because that does not mean much)? Because you want input from the Security team?

@Dereckson: Why was this added to Security-General (because that does not mean much)? Because you want input from the Security team?

Indeed.

Wanting input from the security team seems the reason. If that tag has no meaning why do we have it?

We have two security goals for the whitelist,

  • prevent users from exploiting client-side issues on the cluster
  • keep the cluster from attacking other websites

In this case, the second issue really doesn't apply. We can overwhelm labs from production, but we're just attacking ourselves... kinda pointless. For the first, I'm not comfortable allowing all of wmflabs, since an attacker can host scripts there that would be able to exploit things like heartbleed, or vulnerabilities in curl. If we have a specific subdomain on wmflabs with access restricted to trusted users, we could whitelist that one domain.

Such whitelisting should also be signed off by labs ops, I'd guess - it's fairly trivial to accidentally take down all of labs from prod :)

Hi, I understand that there may be security issues here, but then this is tagged "Security: None". So are there security issues or not? From the above comments, it seems that review by a security admin is planned. Could you please indicate a deadline? Could you please speed up this a bit, it is blocking other tasks. Thanks in advance.

The Security tag in phab is about whether the ticket contains security-sensitive content that shouldn't be exposed to the general public.

The "Security" field is just our custom Phabricator way of allowing unprivileged users to set up task visibility policies without letting them do too much (used to create private tickets), it's not relevant to the discussion here. I see no reason to create a deadline here, and I don't see why it should be particularly highly prioritised.

Thanks for the explanation. It should be prioritized because it blocks quite a number of tools. This seems a good enough reason for me. And I think a deadline is needed to be sure it is not lost, and it would be at the very least mere politeness towards people who ask something. So either give a deadline, and simply say you won't do it. Thanks, Yann

Steinsplitter raised the priority of this task from Medium to High.May 20 2015, 11:35 AM

Changing priority to "high". This is blocking quite a lot...

This comment was removed by Dereckson.

@Yann @Steinsplitter would you have any constructive comment about the task in addition to repeat this blocks other things? We've already got security feedback:

[One of the two goal is to] prevent users from exploiting client-side issues on the cluster
[...] I'm not comfortable allowing all of wmflabs, since an attacker can host scripts there that would be able to exploit things like heartbleed, or vulnerabilities in curl. If we have a specific subdomain on wmflabs with access restricted to trusted users, we could whitelist that one domain.

Let's try to figure something.

We could create a project similar to the tool server, but with a restricted scope and tailored to prevent facilities for tools with a mission to upload content to Commons.

Access to databases replication is possible.

This server would be a labs project with accounts restricted to trusted users, in good standing with Wikimedia Commons community, and in good standing with the developers community.

@Dereckson, that sounds like a good idea (assuming ops is ok with it).

The only additions I would ask is that we make sure no one sets up a proxy there that forwards requests to arbitrary places.

@yuvipanda can we work together to make this happen ?

A clarification is needed here I think:

  1. Labs is setup so it equals the level of access of the rest of the Internet. So Labs hosts can only access production hosts that the rest of the Internet can access as well. Therefore, you should in most cases be able to use production services from Labs.
  2. The majority of the production cluster can *not* talk to the Internet, they are on private address space / VLANs, and no NAT. Likewise, it can also not talk to Labs. Where needed, we work around this with proxies and similar solutions.

This is deliberately done to maintain security of the (production) cluster and maintain isolation of Labs, and we can't just change that.

Can specific instances be whitelisted ?

Can specific instances be whitelisted ?

I'm afraid not, no... That's too much of a security loophole, given the more dynamic and less secure nature of Labs.

The use case from T78167 is for wgCopyUploadsDomain:

legoktm@terbium:~$ HTTPS_PROXY=url-downloader.wikimedia.org:8080 curl https://tools.wmflabs.org/legobot/hi.txt
curl: (56) Received HTTP code 403 from proxy after CONNECT

If I try again now, it seems to pass with:

HTTPS_PROXY=url-downloader.wikimedia.org:8080 curl https://tools.wmflabs.org/

Maybe the url-downloader did not have access to the labs reverse proxy / tools-wmflabs.org ..

Matanya claimed this task.

Per @mark comments above.

Andrew subscribed.

This ticket has a terrible, unclear title, and even after reading the ticket I'm not 100% sure what it's about.

I'm pretty sure that this bug is about having production wikis access public services hosted on labs that are already exposed to the rest of the internet. As far as I know, this is already the case for quite a few things in tools and elsewhere in labs.

I am pretty sure that @mark is discussing the prospect of routing internal services (e.g. 10.x.x.x IP traffic) between labs and production, which is forbidden for good reason and will remain forbidden.

Similarly, I would oppose a blanket whitelisting of *.wmflabs.org anywhere in production, because 1) any time a production wiki hits labs it has the potential to overwhelm labs and break it for everyone and 2) in general we want to discourage production services from become unintentionally reliant on labs support.

However: I don't think there's a reason to not consider whitelisting specific public services on a case-by-case basis, which I suspect is what Matanya was asking.

@mark, can you clarify?

This ticket has a terrible, unclear title, and even after reading the ticket I'm not 100% sure what it's about.

I think we can break it down like this. We have two environments:

  • Production
  • Labs

Each environment has zones parts:

  • The internal part (probably using rfc1918 space)
  • The external internet facing part

These two environments and zones are probably a bit mixed up making it harder to do clear separation of traffic. I would focus on enabling connectivity between the external zones of labs and production.

Flow from labs to production:
some labs host -> firewall with nat to a pub ip -> some external routing -> production load balancer -> rest of production chain

And the other way around:
some production host -> firewall with nat or whatever you use right now for things lik uploadbyurl -> some external routing -> labs load balancer -> labs resource

That way you have a clear infrastructure design which makes it much easier to manage any security risks.

Do we have any network/security designs of labs and production and how these interact?

This ticket has a terrible, unclear title, and even after reading the ticket I'm not 100% sure what it's about.

Agreed. :)

I'm pretty sure that this bug is about having production wikis access public services hosted on labs that are already exposed to the rest of the internet. As far as I know, this is already the case for quite a few things in tools and elsewhere in labs.

Production wikis, running on servers in production internal VLANs and internal IPs, can't access the rest of the Internet either (other than through a special proxy). Labs is no different in that respect. There is no NAT in production.

I am pretty sure that @mark is discussing the prospect of routing internal services (e.g. 10.x.x.x IP traffic) between labs and production, which is forbidden for good reason and will remain forbidden.

Similarly, I would oppose a blanket whitelisting of *.wmflabs.org anywhere in production, because 1) any time a production wiki hits labs it has the potential to overwhelm labs and break it for everyone and 2) in general we want to discourage production services from become unintentionally reliant on labs support.

However: I don't think there's a reason to not consider whitelisting specific public services on a case-by-case basis, which I suspect is what Matanya was asking.

@mark, can you clarify?

I think this is not relevant given the above?

Dereckson renamed this task from Production cluster can't access labs cluster to Allow the production cluster to access *.wmflabs.org IPs.EditedDec 2 2016, 6:49 PM

This task has been open as it has been identified in the past tools.wmflabs.org and other similar URLs weren't reachable by the proxy used to download files to external URLs. The goal was to request from Wikimedia Commons upload by URL, with URLs from a server hosted in Labs.

I guess we could mark this resolved, as it describes an old situation, now working per T95714#1470497 test.

akosiaris claimed this task.
akosiaris subscribed.

Resolving per the comment above.