Page MenuHomePhabricator

Horizon should use a proxy to access cloud vps hosted apis
Open, Needs TriagePublic

Description

splitting from T305414, this is so that the new cloudweb* hosts will not need public IPs

The proxy and puppet horizon dashboards will need to be updated to use a HTTP proxy for api requests.

Event Timeline

Looks like we will need to serve the API endpoints from port 443 if we want them to be accessible via the HTTP proxy:

taavi@cloudweb2001-dev ~ $ curl --proxy http://webproxy.codfw.wmnet:8080 https://novaproxy.codfw1dev.wmcloud.org:5668
curl: (56) Received HTTP code 403 from proxy after CONNECT
taavi@cloudweb2001-dev ~ $ curl --proxy http://webproxy.codfw.wmnet:8080 https://novaproxy.codfw1dev.wmcloud.org
<!DOCTYPE html>

Change 781950 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] dynamicproxy: expose api on port 443

https://gerrit.wikimedia.org/r/781950

I'm somewhat lost here -- sorry if this is all spelled out in another task which I've missed. I'm going to talk through my understanding of the situation.

The traffic from a labweb host to e.g. the puppet enc looks like case three, here:

https://wikitech.wikimedia.org/wiki/Cross-Realm_traffic_guidelines#Case_3:_generic_network_access_prod_--%3E_cloud

Re-reading case 3, it looks like that document assumes that the interesting part is ingress on the service side, and assumes that the client (in this case, Horizon) has access to the public internet.

If we move cloudweb hosts to private IPs, then they will no longer have access to the public internet, hence the need for a proxy. Is that right so far?

If we move cloudweb hosts to private IPs we will also have the question of how to route traffic /to/ the the new cloudweb hosts (for e.g. https://horizon.wikimedia.org). The obvious way to handle that is via production LVS, since that's how it's handled now.

--but--

Neither of those are good end states, are they? We decided not to use production lvs for swift (see T296411, "cloud: decide on general idea for having cloud-dedicated hardware provide service in the cloud realm & the internet") so it seems likely that whatever front end we choose for swift should also ultimately be the front end for other services running on bare metal -- nova, horizon, whatever.

So... this task and the chatter on T305414 and T305414 about private IPs seem premature. Anything that we do today to move them off of public IPs will only need to be redone when we have a real solution for T296411. Meanwhile, the engineering work to support outbound traffic (e.g. horizon->enc) may or may not still be useful depending on where in the network we ultimately move Horizon.

I would like us to have a complete plan for all of these pieces before we randomly shuffle the deck chairs around. We will almost certainly not have such a plan before Arturo comes back from parental leave, which seems fine to me.

If there there is true urgency about reclaiming public IPs within the next few months then perhaps this is worth the trouble, but otherwise I'd prefer we wait. I'm also open to an argument about why these different issues can be decoupled.

Thanks @Andrew ! All valid points.

The cloud realm is there to provide separation between less trusted VM environment and production.
My understanding is that only low traffic, independent and trusted services are running on cloudweb hosts. Based on that and to (paradoxically?) not make too many changes. I think it would still be fine to keep them on the prod side as well. Of course if there are larger changes (it could make sens to move them to the cloud realm).
So by my understanding, they can be a good end state as their fate doesn't depend on the solution for Cloudswift.

Again, there is no urgency, it's an effort to steer the ship toward newer/better best practices, especially if they're low handing fruits.
If there is no consensus or it's too much efforts, keeping the status-quo during a refresh for later discussion is fine as well.