Page MenuHomePhabricator

Make HTTP calls work within mediawiki on kubernetes
Closed, ResolvedPublic

Description

@dancy was looking into why /favicon.ico requests aren't working on k8s. They are meant to be rewritten to /w/favicon.php, and indeed are, but there the trail goes cold. The http request that the MW PHP code makes from there is failing.

We don't yet have Logstash telemetry on errors here, but I suspected this might mean HTTP calls between wikis aren't working more generally, and indeed that appears to be the case.

The most prominent example, afaik, of HTTP calls within MW is File description pages which make API calls to Commons. For example:

https://test.wikipedia.org/wiki/File:Example.png

When viewed over XWD/k8s, the file description is missing. This is potentially a cache poisoining issue as well since file descriptions are kept in memcached, and with a fairly higih TTL.

Related Objects

StatusSubtypeAssignedTask
StalledNone
OpenNone
OpenNone
OpenNone
StalledNone
OpenNone
StalledNone
StalledKrinkle
OpenNone
StalledNone
OpenNone
ResolvedJoe
ResolvedJoe
ResolvedNone
Resolvedjijiki
Resolvedjijiki
Resolveddancy
Resolveddancy
ResolvedJoe
ResolvedJoe
Resolvedjeena
ResolvedJoe
ResolvedJoe
Resolveddancy
ResolvedJoe
Resolved dpifke
Resolveddancy
ResolvedJoe
ResolvedClement_Goubert
Resolvedcolewhite
Resolvedjijiki
Resolved dpifke
ResolvedLegoktm

Event Timeline

I'd assume that MW makes HTTP calls to the public endpoints of MW. Those will be blocked in k8s as we generally prohibit egress traffic. I'm not sure this is the right solution here, but all other services talking to mw-api do so via a service-proxy listener (see: https://wikitech.wikimedia.org/wiki/Envoy#Example_(calling_mw-api) ). /cc @Legoktm

I'd assume that MW makes HTTP calls to the public endpoints of MW. Those will be blocked in k8s as we generally prohibit egress traffic. I'm not sure this is the right solution here, but all other services talking to mw-api do so via a service-proxy listener (see: https://wikitech.wikimedia.org/wiki/Envoy#Example_(calling_mw-api) ). /cc @Legoktm

Yeah, MW mostly sends requests to the public endpoints because it's convenient for developers, but we really need to stop that, for all the reasons we introduced envoy (TLS overhead, unnecessarily hitting ATS/varnish, etc.)

https://codesearch.wmcloud.org/operations/?q=%2Fw%2F&i=nope&files=&excludeFiles=&repos=Wikimedia%20MediaWiki%20config is a naive codesearch that identifies most of these places (note that some of those URLs are used client-side and OK).

Each extension or core code will need to support configuring and sending a Host header for the domain we want to hit. I'll file subtasks for the cases I see.

For the record, to resolve the same issue during our effort to upgrade Fandom's MW-on-k8s deployment, we ended up creating an HttpRequestFactory service override to dynamically use the current k8s service as proxy for request URLs known to belong to wikis on our platform. Some "automatic" solution like that could help avoid the whack-a-mole of updating every extension and core call site that makes internal HTTP requests. The logic for determining whether an URL should be automatically proxied or not would probably be similar to the existing core MWHttpRequest::isLocalURL method.

For the record, to resolve the same issue during our effort to upgrade Fandom's MW-on-k8s deployment, we ended up creating an HttpRequestFactory service override to dynamically use the current k8s service as proxy for request URLs known to belong to wikis on our platform. Some "automatic" solution like that could help avoid the whack-a-mole of updating every extension and core call site that makes internal HTTP requests. The logic for determining whether an URL should be automatically proxied or not would probably be similar to the existing core MWHttpRequest::isLocalURL method.

Thanks, this is super helpful. I think this is mostly the direction we should go in, developers and configuration can still use the normal wiki domains, e.g. https://meta.wikimedia.org/w/api.php, but internally MediaWiki will route the request over an envoy proxy.

For this to work, we'll need something like $wgLocalHttpProxy, which allows setting a different proxy (from $wgHttpProxy) for when MWHttpRequest::isLocalURL returns true.

One other consideration is whether we need to specifically route index.php and api.php requests to the correct cluster, or can we send them all to api? For reference, the main internal requests I see are either to action=raw (SpamBlacklist/TitleBlacklist) or action=render (InstantCommons) which are both API-like behavior.

Change 713896 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] http: Add $wgLocalHTTPProxy to set a proxy for local requests

https://gerrit.wikimedia.org/r/713896

One other consideration is whether we need to specifically route index.php and api.php requests to the correct cluster, or can we send them all to api? For reference, the main internal requests I see are either to action=raw (SpamBlacklist/TitleBlacklist) or action=render (InstantCommons) which are both API-like behavior.

I think the simplest thing to do is send it all to the API cluster for now.

Change 713896 merged by jenkins-bot:

[mediawiki/core@master] http: Add $wgLocalHTTPProxy to set a proxy for local requests

https://gerrit.wikimedia.org/r/713896

Change 714420 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/puppet@production] services_proxy: Add mwapi envoyproxy for MediaWiki-internal requests

https://gerrit.wikimedia.org/r/714420

My deployment plan is:

  • Turn on envoy proxy nowish, test various requests with curl manually
  • Enable proxy in mwdebug k8s deployment too.
  • After the train rolls to group0 tomorrow, set $wgLocalHTTPProxy to point to the new proxy
    • Test InstantCommons and GlobalUserPage still work (just find an uncached image/user on an obscure wiki)
  • Allow it to roll out with the rest of the train, and document it as a "risky change".
  • Enable proxy in mwdebug k8s deployment too.

Note that this proxy should point to the mwdebug k8s deployment, not the main api_appserver cluster.

Change 714420 merged by Legoktm:

[operations/puppet@production] services_proxy: Add mwapi envoyproxy for MediaWiki-internal requests

https://gerrit.wikimedia.org/r/714420

I think it would be interesting to actually do what @TK-999 suggested and actually intercept all HTTP requests and route them to the correct proxy transparently.

That would allow us to do more complex stuff on our side, while keeping configuration easily intelligible to developers.

For now, though, this should be enough.

Change 720944 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/mediawiki-config@master] Make the apple dictionary bridge work in kubernetes

https://gerrit.wikimedia.org/r/720944

Change 720944 abandoned by Giuseppe Lavagetto:

[operations/mediawiki-config@master] Make the apple dictionary bridge work in kubernetes

Reason:

the search bridge is being split to a separate service.

https://gerrit.wikimedia.org/r/720944

My overdue status update:

Using envoy as a proper proxy instead than a transparent one makes me uneasy a bit; it's not how we've used it, and I don't think I'm happy testing new code paths there with mediawiki specifically.

We could teach MediaWiki how to use a transparent proxy instead, I'll poke at that.

We could teach MediaWiki how to use a transparent proxy instead, I'll poke at that.

Yeah that's basically being able to change the IP:PORT you're connecting to, and possibly use HTTP instead of HTTPS.

But at this point I think it would be interesting to have a way to intercept all http calls, look up the hostname in a table, and if it corresponds, intercept the call and direct it to our middleware.

The third alternative is to actually intercept the calls at the container level, maybe even as simple as adding entries to /etc/hosts...

Change 728622 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] Allow using a transparent proxy for local HTTP requests

https://gerrit.wikimedia.org/r/728622

After reading https://en.wikipedia.org/wiki/Proxy_server#Transparent_proxy I'm not exactly sure "transparent proxy" is the correct terminology, but the patch should work.

Change 728622 merged by jenkins-bot:

[mediawiki/core@master] Allow using a reverse proxy for local HTTP requests

https://gerrit.wikimedia.org/r/728622

Change 731757 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@wmf/1.38.0-wmf.4] Allow using a reverse proxy for local HTTP requests

https://gerrit.wikimedia.org/r/731757

Change 731757 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.4] Allow using a reverse proxy for local HTTP requests

https://gerrit.wikimedia.org/r/731757

Mentioned in SAL (#wikimedia-operations) [2021-10-18T22:56:33Z] <legoktm@deploy1002> Synchronized php-1.38.0-wmf.4/includes/http/MWHttpRequest.php: Allow using a reverse proxy for local HTTP requests (T288848) (duration: 00m 56s)

Change 731861 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Add framework for setting $wgLocalHTTPProxy

https://gerrit.wikimedia.org/r/731861

Change 731862 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Enable $wgLocalHTTPProxy on group0 wikis

https://gerrit.wikimedia.org/r/731862

Change 731861 merged by jenkins-bot:

[operations/mediawiki-config@master] Add framework for setting $wgLocalHTTPProxy

https://gerrit.wikimedia.org/r/731861

Mentioned in SAL (#wikimedia-operations) [2021-10-19T17:59:26Z] <legoktm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Add framework for setting $wgLocalHTTPProxy (T288848) (1/2) (duration: 01m 05s)

Mentioned in SAL (#wikimedia-operations) [2021-10-19T18:00:48Z] <legoktm@deploy1002> Synchronized wmf-config/: Add framework for setting $wgLocalHTTPProxy (T288848) (2/2) (duration: 01m 06s)

Change 731862 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable $wgLocalHTTPProxy on group0 wikis

https://gerrit.wikimedia.org/r/731862

Mentioned in SAL (#wikimedia-operations) [2021-10-20T04:40:02Z] <legoktm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Enable $wgLocalHTTPProxy on group0 wikis (T288848) (duration: 01m 05s)

Thanks to other activity, I realized my patch didn't cover MultiHttpClient, which is used by Echo for cross-wiki notifications, and probably other things.

MultiHttpClient is more complicated than the previous part since it's in includes/libs/ and isn't supposed to depend upon MediaWiki. Three approaches I came up with:

  1. Have similar isLocalURL() function, and just inject the domain list. This is code-wise the simplest but feels a weird fit for a library.
  2. Inject some kind of mapping for the domains to point to the reverse proxy.
  3. Add a callback hook to normalizeRequests() that checks if URL matches a local vhost and if so, adjusts the domain + host header.

I'm working on implementing #1 since it's the most straightforward and just mirrors what was done in MWHttpRequest.

Change 735073 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] [WIP] Support $wgLocalHTTPProxy in MultiHttpClient

https://gerrit.wikimedia.org/r/735073

Change 739323 had a related patch set uploaded (by Legoktm; author: Legoktm):

[mediawiki/core@master] http: Don't set X-Forwarded-Proto when using a reverse proxy

https://gerrit.wikimedia.org/r/739323

Change 739323 merged by jenkins-bot:

[mediawiki/core@master] http: Don't set X-Forwarded-Proto when using a reverse proxy

https://gerrit.wikimedia.org/r/739323

Change 735073 merged by jenkins-bot:

[mediawiki/core@master] Support $wgLocalHTTPProxy in MultiHttpClient

https://gerrit.wikimedia.org/r/735073

Change 748787 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/mediawiki-config@master] Enable $wgLocalHTTPProxy on Kubernetes, regardless of group

https://gerrit.wikimedia.org/r/748787

Change 748787 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable $wgLocalHTTPProxy on Kubernetes, regardless of group

https://gerrit.wikimedia.org/r/748787

With the above two patches, I was able to successfully load cross-wiki notifications, aka make cross-wiki HTTP requests from en.wikipedia to test.wikidata and meta.wikimedia, then from test.wikidata to meta.wikimedia. If there are any remaining issues, it probably means we just need to add more domains to $wgLocalVirtualHosts.

I will file a separate follow-up task for finishing the rollout on traditional appservers, the issue with Kubernetes is fixed.