Page MenuHomePhabricator

Support cross-domain $wgLoadScript URLs in ResourceLoader
Closed, ResolvedPublic

Description

For T363695: Create a Wikimedia login domain that can be served by any wiki we are configuring MediaWiki to display login pages under a different domain and URL structure, and to reduce complexity we set $wgLoadScript to a full URL to the standard domain, so load.php is always called via the same domain. That doesn't quite work because when accessed via the standard domain, ResourceLoader will use relative URLs, which get written into the startup module: https://sso.wikimedia.org/en.wikipedia.org/wiki/Special:Userlogin calls https://en.wikipedia.org/w/load.php?modules=startup which is dynamically generated JS and will contain something like mw.loader.setSource( 'local', '/w/load.php' ) and then mw.loader will use the wrong URL.

To minimize the changes necessitated by SUL3, rather than expanding URLs everywhere, we only want to expand them where needed, and only on Wikimedia sites (ie. with a hook rather than by default).

, and that will be used by some ResourceLoader components (the startup module at least) will use relative URLs which will be interpreted in the context of the

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1054931 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/core@master] resourceloader: Add ResourceLoaderModifyStartupSourceUrls hook

https://gerrit.wikimedia.org/r/1054931

Change #1057005 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/CentralAuth@master] SUL3: Expand ResourceLoader URLs

https://gerrit.wikimedia.org/r/1057005

Change #1054931 merged by jenkins-bot:

[mediawiki/core@master] resourceloader: Add ResourceLoaderModifyStartupSourceUrls hook

https://gerrit.wikimedia.org/r/1054931

Change #1057005 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] SUL3: Expand ResourceLoader URLs

https://gerrit.wikimedia.org/r/1057005

After the patches got deployed on Beta, most things work, except some icon URLs:
https://sso.wikimedia.beta.wmflabs.org/w/load.php?modules=skins.vector.icons.js&image=edit&format=original&lang=en&skin=vector-2022&version=1pcaj
is still being loaded from the wrong domain. But the almost identical
https://en.wikipedia.beta.wmflabs.org/w/load.php?modules=skins.vector.icons&image=ellipsis&format=original&lang=en&skin=vector-2022&version=15nyl
URL is correct and there is no difference between the two modules.

The one that works has a referer like https://en.wikipedia.beta.wmflabs.org/w/load.php?lang=en&modules=... and the one that doesn't work has https://sso.wikimedia.beta.wmflabs.org/en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:CreateAccount
So I guess loading an icon works from JS but does not work from PHP?

So I guess loading an icon works from JS but does not work from PHP?

I believe it's the other way around.

The one that works is background-image: url(/w/load.php…), inside a stylesheeet at https://en.wikipedia.beta.wmflabs.org/w/load.php?lang=en&module=…&only=styles that in turn was queued by PHP/HTML as "skins.vector.icons" styles module during the sso.wikimedia response.

The one that's broken is background-image: url(/w/load.php…), inserted as inline <style> tag generated by JavaScript, fetched as "skins.vector.icons.js" by JavaScript from https://en.wikipedia.beta.wmflabs.org/w/load.php

Screenshot 2024-08-02 at 01.55.48.png (1×2 px, 603 KB)

This is indeed a case where image references are interpreted relative to the HTML page, rather than the stylesheet, because lazily inserted inline styles don't have any stylesheet URL. Once JavaScript takes the string from that array, and inserts it as an inline <style> tag, the browser can't make any association that the data came from some URL originally.

One of the alternative options I've had in the back pocket, is a <base href> tag. I've used this recently in another project, and it's quite an effectively way to "rebase" a page without needing any other configuration. It's no doubt a compromise that may require tweaking one or two things in the other direction, but I wonder if that might offer a better trade off, with fewer wild geese to chase?

Which CSS, IMG, JS, or hyperlinks on an SSO page view "must" go to the SSO domain?

You may have plans for this already, but the footer links in beta SSO currently point to the local SSO instead of the canonical, which <base> would fix, for example. It would potentially mean no need to configure wgLoadScript, wgScriptPath, wgArticlePath etc. https://sso.wikimedia.beta.wmflabs.org/en.wikipedia.beta.wmflabs.org/w/index.php?title=Special:CreateAccount. Might also resolve T371609: Rewrite URLs when using the shared login domain.

Krinkle triaged this task as Medium priority.Aug 12 2024, 2:15 PM

If we use a <base>, we'll still need to do something about T371609, just in the opposite way (instead of rewriting links that should go to the normal domain instead of SSO, we'll need to rewrite links that should go to the SSO domain instead of normal). It wouldn't really be any more difficult, though.

<base> could be a simpler way to handle the long tail of minor things. I also spotted this today: T374286: On sso.wikimedia.beta.wmflabs.org login page, the "Powered by MediaWiki" icon does not render

Change #1071268 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/CentralAuth@master] Rewrite URLs when using the shared login domain

https://gerrit.wikimedia.org/r/1071268

This patch has to use WikiMap::getWiki( WikiMap::getCurrentWikiId() )->getCanonicalServer() which reads from $wgConf even for the current wiki. That, because we currently override $wgCanonicalServer for SSO domain in wmf-config. This seems likely to backfire in some way, given that it's literally not the canonical domain. And, the fact that WikiMap uses wgConf even for the current wiki is also an implementation detail I'd rather not rely on. It seems reasonable to me if that were to e.g. read from MainConfig instead. It doesn't right now, but both of these are small risks I'll add to the pile of reasons that favour <base>.

We could just pass the original URL as a config variable, if that seems safer. (Which is not to say we shouldn't use <base>.)

Another differences is that with <base>, we'd leave $wgServer undisturbed, and maybe it's used for some internal (non-HTML-rendering-related) purpose somewhere, like a server-to-server API call. Not sure if using the normal domain rather than the SSO domain is a good or bad thing in such a situation. Probably a good thing.

Change #1074378 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/CentralAuth@master] [DO NOT MERGE] Try using a `<base>` tag on SSO domain

https://gerrit.wikimedia.org/r/1074378

I gave <base> a try.

It introduces at least one new problem: link to fragments (href="#foo") are also resolved relative to the new base, which means that they now link to a different page instead of navigating within the current one. We can fix that for links within the content using the GetLocalURL hook, but not for links generated by the skin, such as the accessibility jump links (<a class="mw-jump-link" href="#bodyContent">Jump to content</a>).

It doesn't simplify the GetLocalURL hook code: we obviously now have to rewrite links to the pages on the SSO domain, but we also still have to rewrite the links to the local domain, because the resolved links would still have the "prefix" (wiki name) that is used on the SSO domain. (This also means the <base> also doesn't fix links generated by the skin, and not handled by this hook, such as the search form action attribute.)

And… loading the icons from the inline <style> tag still doesn't work. They now try to load from the local domain, as expected, but they fail with a Same Origin Policy error! On Firefox it looks like this:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://<local domain>/w/load.php?modules=skins.vector.icons.js&image=article&format=original&lang=en&skin=vector-2022&version=1g6m0. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 200.

I didn't know that's even possible for resources loaded from CSS. Apparently this is because the icons on Vector 2022 use mask-image instead of background-image, and stricter rules are applied because the browser has to process the image's pixels instead of displaying them, or something. MDN's https://developer.mozilla.org/en-US/docs/Web/CSS/mask-image only has a cryptic note about CORS policy, and the relevant spec has nothing about this, as far as I could find. That's what we get for using experimental technique just to display some tiny images.

(Firefox seems to only do this for stylesheets created from JS code, or maybe only for <style>; Edge also has these errors for icons loaded from <link> stylesheets.)

So, I think we need to either make load.php respond with Access-Control-Allow-Origin headers in some cases (not sure what it would take, or if it might have unwanted consequences), or allow serving at least some load.php requests from the SSO domain (I'm not sure what the consequences of this could be either – I vaguely recall we wanted to avoid it for caching reasons, but I'm not sure why; maybe just serving the images would be okay?).

Adding Allow-Origin headers seems pretty harmless (as usually same-domain rules do not prevent loading resources, only accessing their content). But the prefix problem seems harder to fix (in an ideal world MediaWiki would use a <base> tag by default to set the prefix and we'd only ned override it, but that's not how it works currently) so I guess that means <base> is out of the question? Which leaves us with adding another hook.

(Or we could add an Apache rewrite for all domains that removes the domain name prefix from the URL, but I don't think we should go there.)

I gave <base> a try.

[…]

On Firefox it looks like this:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://<local domain>/w/load.php?modules=skins.vector.icons.js&image=article&format=original&lang=en&skin=vector-2022&version=1g6m0. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 200.

[…]

So, I think we need to either make load.php respond with Access-Control-Allow-Origin headers in some cases […]

ResourceLoader emits Access-Control-Allow-Origin: * for styles=only requests (source). It looks like the above warning is for an SVG icon from the image= route, which does not yet emit such header. I'm guessing this is because we introduced the ResourceLoaderImage feature after we decom'ed bits.wikimedia.org in 2014, and we haven't tried to use load.php cross-domain since then.

Seems like it'd be fine to add to images as well. Although I do note that * is ineffective for credentials connections. It is not obvious to me right away which and whether these are made with credentials. I'm sure ResourceLoader doesn't need credentials here, since we strip cookies anyway, but for historical consistency, most parts of the HTML spec tend to make requests with credentials. An interesting exception to this are CSS @font-face requests (and hence why crossorigin="anonymous" when using a font URL with <link rel=preload>, as otherwise the preload will make an ignored/incompatible request).

Change #1075239 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] resourceloader: Serve images with "Access-Control-Allow-Origin: *"

https://gerrit.wikimedia.org/r/1075239

That patch fixes that particular issue.

I tested the <base> approach a bit more and found some more problems. Basically, it works too well: AJAX validation of the username on the signup form is broken, because the <base> makes it do cross-domain requests to api.php, which are disallowed.

It was worth trying but I'm increasingly thinking that we will have fewer issues to resolve with the other approach.

Change #1074378 abandoned by Bartosz Dziewoński:

[mediawiki/extensions/CentralAuth@master] [DO NOT MERGE] Try using a `<base>` tag on SSO domain

Reason:

Causes too many problems to be worth doing.

https://gerrit.wikimedia.org/r/1074378

Could we just set $wgLoadScript to an absolute URL in production? You address this somewhat in the task description ("To minimize the changes necessitated by SUL3, rather than expanding URLs everywhere, we only want to expand them where needed, and only on Wikimedia sites (ie. with a hook rather than by default)."), but given that we're discovering the hook-based solution to be much more complicated than we thought, maybe it's worth revisiting this question.

If we can't do that, then here are some other options to consider. I don't really like any of them.

Change #1075239 merged by jenkins-bot:

[mediawiki/core@master] resourceloader: Serve images with "Access-Control-Allow-Origin: *"

https://gerrit.wikimedia.org/r/1075239

Could we just set $wgLoadScript to an absolute URL in production?

It wouldn't work well with mobile domains. There was some discussion of doing a more limited version of it here.

Maybe if we changed MobileContext::getMobileUrl() to be static so it can be used in configuration, but it's not trivial.

Or if we would take on T195494: Handle mobile domains in core, which is much more ambitious, but clearly it is a big pain point.

Add another hook similar to ResourceLoaderModifyStartupSourceUrls, and call it from ResourceLoader\ImageModule when determining the path to load.php to use

Or make the existing hook more generic (it hasn't been part of a release yet). That has the benefit of being able to use MobileContext methods. IMO the least bad solution, at least in the short term.

Rewrite the load.php URLs dynamically in JS when adding <style> tags

There's also a new web feature called document.adoptedStyleSheets

Those would interfere with using CSP hashes which AIUI we want to enforce eventually. Maybe not too much of an issue if it's only for CSS, not JS?

Change #1076243 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] resourceloader: Add/change ResourceLoaderModifyEmbeddedSourceUrls hook

https://gerrit.wikimedia.org/r/1076243

Change #1076244 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/extensions/CentralAuth@master] Update ResourceLoaderModifyEmbeddedSourceUrls hook handler

https://gerrit.wikimedia.org/r/1076244

Change #1076243 merged by jenkins-bot:

[mediawiki/core@master] resourceloader: Add/change ResourceLoaderModifyEmbeddedSourceUrls hook

https://gerrit.wikimedia.org/r/1076243

Change #1076244 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Update ResourceLoaderModifyEmbeddedSourceUrls hook handler

https://gerrit.wikimedia.org/r/1076244

matmarex removed a project: Patch-For-Review.

I think that resolves all known problems. Optimistically closing.