Page MenuHomePhabricator

MW maintenance scripts on k8s can't do internal HTTP requests
Closed, ResolvedPublic

Description

It's not possible to do HTTP requests to Wikimedia websites from MW maintenance scripts running on k8s, because the change in network topology between bare metal and k8s is not fully hidden by changing $wgLocalHTTPProxy to point to a proxy that can reach those servers.

The traditional $wgLocalHTTPProxy feature was a performance feature allowing requests for certain domains to be done via localhost:80, bypassing the CDN. This is reflected in the documentation for $wgLocalVirtualHosts "This lists domains that are configured as virtual hosts on the same machine" and MWHttpRequest::isLocalURL "Check if the URL can be served by localhost".

In the old days, CLI requests were typically run on servers without Apache, so MWHttpRequest::isLocalURL() returns false in CLI mode regardless of $wgLocalVirtualHosts. This hasn't been true for a few years, since we've had the service mesh in operation. Moreover, this choice made the assumption that everyone running MediaWiki would run cli scripts on a different host than where they were running any webserver, which is more or less assuming the wikimedia setup is universal, while it's not.

Perhaps in hindsight this could have been done by setting $wgLocalVirtualHosts = [] in config.

One simple solution to the problem is to remove the conditional in the code that switches off local urls management in CLI mode, and let the user decide when to switch it on/off.

As a longer term solution, we could deprecate $wgLocalHTTPProxy and $wgLocalVirtualHosts. Instead, have callers specify a URL zone (internal or external) and configure proxies by zone.

Within each zone, we could allow specific configuration of proxies by domain name, to provide a migration path from $wgLocalHTTPProxy, in case anyone is still using it for its intended purpose.

Event Timeline

Joe removed RLazarus as the assignee of this task.
Joe added a subscriber: RLazarus.

I would say this looks relatively urgent.

I should also add - as @Tgr noticed, we should also allow using the local proxy in CLI scripts from MWHttpRequest.

We don't really have other methods, everything goes through either MWHttpRequest or MultiHttpClient.

How about I change the title and description to describe our actual problem.

tstarling renamed this task from Extend use of $wgLocalHTTPProxy to other http call methods to MW maintenance scripts on k8s can't do internal HTTP requests.Dec 3 2024, 6:15 AM
tstarling updated the task description. (Show Details)

I've corrected the main inaccuracies in the updated task description, but I want to also add that I disagree with the idea we should let callers decide if a call is internal or external - situations might change, installations might differ, and you want to let your system administrator decide how to set up proxying.

We should probably just allow having:

  • A map of domains that we want to not proxy
  • A map of domains that we want to serve via an internal address (no actual HTTP proxying, just rewrite the request to go to the provided URL, while using the domain in the call as Host header)
  • everything else would go via a proxy if it's defined

As far as the current problem goes, I think just removing the conditional in isLocalURL is enough.

Change #1100039 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[mediawiki/core@master] MWHttpRequest: allow using local proxy in cli mode

https://gerrit.wikimedia.org/r/1100039

Requiring callers to specify when a request is internal would have the benefit that there might be some other changes we want to do for internal requests for better logging/metrics (cf MWHttpRequest::setOriginalRequest()). It would be a fair amount of work though, there are way more internal requests than external ones.

If we do want to support unspecified third party setups, maybe the proxy variables should just be retired in favor of a hook or service that can alter requests before they are fired? (Though it doesn't help that MWHttpRequest and MultiHttpClient represent requests in a completely different way.)

Change #1100211 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] http: Update LocalHTTPProxy documentation

https://gerrit.wikimedia.org/r/1100211

Change #1100039 merged by jenkins-bot:

[mediawiki/core@master] MWHttpRequest: allow using local proxy in cli mode

https://gerrit.wikimedia.org/r/1100039

Change #1100211 merged by jenkins-bot:

[mediawiki/core@master] http: Update LocalHTTPProxy documentation

https://gerrit.wikimedia.org/r/1100211

MSantos subscribed.

@Joe I'm moving this to radar in our backlog but it's something we see as important and high-priority. Please let us know if you need extra support.

tstarling claimed this task.