Page MenuHomePhabricator

Proxy support
Closed, ResolvedPublic3 Estimated Story Points

Description

You're going to want to add a config variable, and implement support for a custom HTTP proxy if the extension is destined to be doing HTTP(S) requests to the wider internet, especially in WMF production.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptAug 1 2022, 3:22 PM

Can we use wgCopyUploadProxy for this (as everything else seems to...)?

You can use the value of it (in CS.php in WMF prod), but if you look, nearly every other extension (except GWToolset and FileImporter seemingly) has things like $wgTorBlockProxy, $wgRSSProxy, $wgFlowParsoidHTTPProxy, $wgMediaModerationHttpProxy, wgMachineVisionHttpProxy

Change 820115 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/Phonos@master] wgPhonosAPIProxy: Add proxy support

https://gerrit.wikimedia.org/r/820115

Change 820115 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] wgPhonosApiProxy: Add proxy support

https://gerrit.wikimedia.org/r/820115

@TheresNoTime I setup a forward proxy locally (using this guide) and could connect to Google and our Larynx VPS. But, I don't know if this is similar to how it will be used in production.

@TheresNoTime I setup a forward proxy locally (using this guide) and could connect to Google and our Larynx VPS. But, I don't know if this is similar to how it will be used in production.

Yeah I was thinking yesterday how to test this "accurately".. I'm not sure if we're planning on using the Google-api-proxy (@MusikAnimal could/should we?) โ€” if so, maybe testing using that somehow would be the best solution?

Yeah I was thinking yesterday how to test this "accurately".. I'm not sure if we're planning on using the Google-api-proxy (@MusikAnimal could/should we?) โ€” if so, maybe testing using that somehow would be the best solution?

I'm not certain either to be honest... As I understand it, Google-api-proxy is used solely to obfuscate the end user's IP and user agent from Google, which is also what the proxy option in MWHttpRequestFactory::create is for. I don't think the API key used by the Google-api-proxy is set to allow requests to the TTL service, but we can change that. Either way, traffic from Phonos to external services (Google/Larnyx) originates solely from the backend, so Google is not going to get the end users IP anyway, right? Or perhaps it's still available in the X-Forwarded-For header or something? (This is the moment where I confess my extreme lack of knowledge in networking ๐Ÿ˜›)

@Reedy I am struggling to understand the use-case here. Could you give more details about why this is needed?

Same as for anything else. Wikimedia servers cannot make direct arbitary HTTP(S) requests to servers on the internet. I can never remember offhand whether MW app servers can reach Cloud stuff, but they definitely wouldn't be able to access https://texttospeech.googleapis.com/v1/text:synthesize (the current value of $wgPhonosApiEndpointGoogle).

I honestly wouldn't worry too much about testing it; it's a fairly well defined pattern.

See also: https://wikitech.wikimedia.org/wiki/HTTP_proxy

Same as for anything else. Wikimedia servers cannot make direct arbitary HTTP(S) requests to servers on the internet. I can never remember offhand whether MW app servers can reach Cloud stuff, but they definitely wouldn't be able to access https://texttospeech.googleapis.com/v1/text:synthesize (the current value of $wgPhonosApiEndpointGoogle).

I honestly wouldn't worry too much about testing it; it's a fairly well defined pattern.

See also: https://wikitech.wikimedia.org/wiki/HTTP_proxy

OK, thanks for the explanation, that makes sense.

@TheresNoTime @MusikAnimal I assume there is no issue setting this up correctly when it is time to go to production? It looks like we need a different proxy for each datacentre.

@TheresNoTime @MusikAnimal I assume there is no issue setting this up correctly when it is time to go to production? It looks like we need a different proxy for each datacentre.

You can just use $wgCopyUploadProxy like other extensions do.

$wgRSSProxy = $wgCopyUploadProxy; etc

Aren't requests created with HttpRequestFactory::create() (like Phonos is doing) already using $wgHTTPProxy? I get that there might be situations in which it'd be useful to have different proxies for different out-bound requests, but it seems that in most cases in WMF production they're all using the same. Why does everything making requests have to have its own proxy config?

Aren't requests created with HttpRequestFactory::create() (like Phonos is doing) already using $wgHTTPProxy? I get that there might be situations in which it'd be useful to have different proxies for different out-bound requests, but it seems that in most cases in WMF production they're all using the same. Why does everything making requests have to have its own proxy config?

$wgHTTPProxy isn't set in production. But you're absolutely right that everything shouldn't need to have its own proxy config, see T298264: Use $wgHTTPProxy in Wikimedia production for fixing that.

Going to boldly say this is done