Page MenuHomePhabricator

Figure out remoteExtPath/remoteBasePath automatically for the common case
Open, MediumPublic

Description

Background

When defining a module, we always set a pair of base paths. A localBasePath for where files are on disk, and remoteBasePath for where those files can be accessed from the web. For example, /var/www/mediawiki/resources/examples and /w/resources/example, usually expressed as follows:

		'localBasePath' => MW_INSTALL_PATH . '/resources/src/example',
		'remoteBasePath' => "$wgResourceBasePath/resources/src/example",

Or for extensions, where ExtensionProcessor takes care of prepending __DIR__, and FileModule::extractBasePaths takes care of prepending $wgExtensionAssetsPath (/w/extensions/)

	"ResourceModules": {
		"ext.wikimediaEvents": {
			"localBasePath": "modules/ext.wikimediaEvents",
			"remoteExtPath": "WikimediaEvents/modules/ext.wikimediaEvents",
  • localBasePath is the path on disk to the directory relative to the extension.json file.
  • remoteBasePath is the path to the directory as it is served over HTTP, e.g. /w/extensions/FooBar/resources/ext.foo.
  • remoteExtPath option is a shortcut for remoteBasePath that injects/prepends site configuration wgExtensionAssetsPath.

Idea (2023)

The remote path is the one developers most often set incorrectly because it is least exposed. The majority of file references are flattened (e.g. CSS and JS), embedded (e.g. @embed images), or transformed (e.g. RL images served from load.php).

It is only when you reference an image in CSS without embed and without transformation, or when using legacy debug mode with debug=1, or debug with source maps (T47514) that the remote paths are exposed.

Combine this with the fact that it is possible to determine the "right" remote path based on site configuration, I think this would benefit from automation.

We've done this before already with OutputPage::transformResourcePath, which already shows that localBasePath suffices to figure out the web-based URL path.

Broadly, this would mean three good things:

  1. The following options become redundant, thus simplying the API for developers:
    • remoteBasePath,
    • remoteExtPath,
    • remoteSkinPath.
  2. There is no longer a difference in how modules are defined in core vs in an extension. This makes understanding and skills more transferrable and maximises utility of documentation and their search/discovery by not having multiple ways to express the same concept.
  3. There is no longer the awkward requirement for extensions to internally hardcode and rely on the name of their installed subdirectory. (This could be solved through ExtensionProcessor automation if we wanted, but thusfar we never did. If this task ends up infeasible, I'd probably suggest that instead. It would mean we let ExtensionProcessor fill in the name of the extension directory as well, and thus the two strings become identical in extension.json, allowing us to remove remoteExtPath from extension.json, even if we keep remoteBasePath as concept inside of ResourceLoader.)

There is no immediate need to mass-remove these, as they simply become ignored (akin to when position: "top" was removed, T107399).

Previous idea (2021)

We could start by making remoteBasePath (and remoteExtPath)optional, and have ExtensionProcessor fill in the default based on the extension name (basename of $dir in ExtensionProcessor) plus localBasePath. However, I'd like to explore the possibility of taking this one step further and deeper into ResourceLoader and actually drive towards automating the URL formation in a more robust and verified way.

In OutputPage::transformResourcePath we currently do the inverse of what we need. We try to map a remote path to a file, by comparing it against the possible assets sources (wgExtensionAssetsPath, wgStylePath, wgUploadPath, wgResourceBasePath, etc.) and then mapping it to their on-disk equivalent.

Perhaps it would be feasible to do something similar to map a file path to where it is exposed over HTTP without needing to be told where it is. Things to consider include:

  • non-default wgExtensionAssetsPath.
  • non-overlapping wgExtensionAssetsPath and wgStylePath.
  • non-overlapping wgResourceBasePath and wgScriptPath (e.g. wgResourceBasePath with a different hostname).
  • random other file that has no mapping to any of these (e.g. outside $IP, or in $IP but forbidden through its normal wgScriptPath path, such as vendor). For these we'd presumably support FilePath objects like we do today.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle triaged this task as Medium priority.Sep 14 2021, 12:00 AM
Krinkle moved this task from Inbox to Accepted Enhancement on the MediaWiki-ResourceLoader board.
Krinkle moved this task from Inbox, needs triage to Doing: Goals on the Performance-Team board.

In addition to this being essentially duplicated info (that is often wrong as in many extensions its only used in debug mode so typos go unnoticed), i feel like this really doesn't make sense to specify in extension.json.

Extension.json should be agnostic to where in the filesystem it is installed (and thus its web path). This setting is basically the only thing that prevents putting an extension in an arbitrary directory. (Most recently i experienced this when trying to switch between mw versions in git but keeping two different checkouts of skuns for different versions of mw, which did not work)

While the less naive implementation sounds interesting, i think in the spirit of worse is better, it is worth implementing the naive version of this.

This idea makes sense to me. I'm a bit worried that there's some purpose to the explicitly specified paths that we're missing, but I can't think of what it could be, and if you haven't thought of one since 2021, then there probably isn't any.

One last thing to worry about is that deciding whether two paths are the same is more difficult than just concatenating them, due to e.g. relative paths, Linux symlinks, and Windows path separators. The implementation in OutputPage::transformResourcePath() ignores all those issues and just compares them as strings, which is actually probably the best way, but making this change may cause issues on upgrading if someone's wiki configuration mixes up different ways to specify paths somehow. We'll need to document this, at least.

I was reminded of this today, and I had a thought.

The idea in the task description is to allow omitting (remoteExtPath|remoteSkinPath|remoteBasePath), and computing it from the (localExtPath|localSkinPath|localBasePath) (replacing the default local base path as a prefix with the default remote base path). I was struggling to find a good idea on how to express that in (extension|skin).json (since omitting the property already means just using the default remote base path by itself, rather than as a prefix).

I thought that this seems needlessly complicated – why not just allow omitting them both and specifying a basePath instead, and then computing both the local and remove base path from it (by adding the default local/remove base path as a prefix)? This seems like it would be equivalent, but easier to both understand and implement.

Change #1112276 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] resourceloader: Allow simply 'basePath' instead of 'localBasePath'/'remoteBasePath'

https://gerrit.wikimedia.org/r/1112276

I was reminded of this today, and I had a thought.

The idea in the task description is to allow omitting (remoteExtPath|remoteSkinPath|remoteBasePath), and computing it from the (localExtPath|localSkinPath|localBasePath)[…]

[…]why not just allow omitting them both and specifying a basePath instead, and then computing both the local and remove base path from it […]

I agree, but I think we're saying the same thing. What's the difference between localBasePath and basePath in extension.json after this change?

Note that localBasePath in extension.json context is already relative in meaning, it is expanded by ExtensionProcessor to relative to the current directory. In other words, if you imagine that it were a PHP file, it implies __DIR__ . . The same is done for most if not all other path-related keys in extension.json (e.g. autoloader, i18n).

It seems like the proposed basePath is essentially the same thing, but in a more indirect way. It could be done more direct I believe, but, in order to support mixing the new basePath with existing concepts, it seems that at least today it would be a fairly complicated concept to grasp.

Can you explore a path forward where we discourage all except the existing localBasePath, and derive remoteBasePath automatically? The main driver of this task, and where the magic is, is in mapping a file path to a URL. Getting the file path should be simple and direct because that's what people interact with day-to-day and will be how they understand module definitions to work.

What's the difference between localBasePath and basePath in extension.json after this change?

"basePath": "foo" is basically the same as "localBasePath": "foo" plus "remoteExtPath": "MyExtension/foo".

Can you explore a path forward where we discourage all except the existing localBasePath, and derive remoteBasePath automatically?

I don't see how that could be done without breaking backwards compatibility. It's already valid to omit all of remoteBasePath/remoteExtPath/remoteSkinPath, it has the same effect as specifying remoteBasePath' => $wgResourceBasePath. We would need some new parameter to specify that the base path should be derived automatically. My idea is that basePath serves as that new parameter, and it also allows us to replace those poorly named "local"/"remote" paths (which should have been called the "file" and "URL" paths). And, as a bonus, it removes an unnecessary difference between Resources.php syntax and extension.json syntax.

Can you explore a path forward where we discourage all except the existing localBasePath, and derive remoteBasePath automatically?

I don't see how that could be done without breaking backwards compatibility. It's already valid to omit all of remoteBasePath/remoteExtPath/remoteSkinPath, it has the same effect as specifying remoteBasePath' => $wgResourceBasePath. […]

OK, I understand you now. I suspect we're making different assumptions. I am making these two assumptions:

  1. It is without exception a bug to define localBasePath today without also defining remoteBasePath.
  2. For any local path that we support serving as a resource in ResourceLoader, we can automatically compute a working remote URL.

Do you see cases where one of these isn't true? If yes, what does a valid module definition look like without remote paths?

It seems to me that inherently, when localBasePath is set, the remote URL will be a 404 Not Found if no matching remote path is set as well, because we'd be appending it to an unrelated base path from core, or from another module.

(Technically two exception exists: Redundantly setting localBasePath to the empty string, which would only be valid in MediaWiki core and we don't do that today. Or setting it identical to the inherited base path from ResourceFileModulePaths. Technically those are cases of "setting" one without the other, but as it is redundant it will work either way and isn't affected by this task.)

It seems to me the only scenario in which our "automatic" remote path might differ from a manual one, is where the remote path was left out by mistake today, and thus resolves to a 404 Not Found. Changing that would be a bug fix.

Note that despite me saying it is a "bug", it is very likely that such module is perceiced as "working fine". Remote base paths are only used for non-embedded CSS images, or non-packgeFiles modules served in debug=1 mode. Most people would likely be happy for months or years without realizing this "bug", and either way fixing for them would be just as invisible as the bug today.

In my mind localBasePath is how resources are declared to ResourceLoader. That's from where production mode reads files off disk and serves them up. We created remoteBasePath originally (and its shortcuts) to make sure ResourceLoader can serve the file raw over HTTP, e.g. for debug=1 mode and for un-embeddeable CSS images. We thought ResourceLoader needed "help" to know where the equivalent URL path is mounted, because we believed (at the time) that we didn't have enough information at runtime to compute it perfectly in all cases. This believe rested mainly on the edge case that people might check out /svn/trunk/extensions outside their /svn/trunk/phase3 checkout (mediawiki). This meant that $wgExtensionDirectory might have been customised to something other than "$IP/extensions", and likewise $wgExtensionAssetsPath might not be "/w/extensions" (where /w is $wgResourceBasePath, which in turn is just $wgScriptPath except for hypothetical bits.wikimedia.org-like setups from the past).

Now, from the discussion on this task and your proof-of-concept, it turns out this is easy to compute by checking a few variables and looking for a match.

The part I'm not understanding is how there could be more than one working URL? And if there really is, does it matter which one we use? And if yes, would a person in this situation do in the proposed new system? Or is there a use case for needing to set both local+remote separately?

The reason I'm pushing back is that I'd like to avoid a migration in people's mental models, and future-incompat over backports. I think it would be valuable if we could only have to announce: "Yay, we no longer have to set a remote path. It's all automatic now. Don't worry about removing it, it works fine either way, it'll be ignored in the future.". That seems preferred over introducing a new property, migrating code, deprecating/removing the old at some point, and facing two ways in the interim. Of course, if doing so is a breaking change, then we can't do it, but I'd like to really make sure of that first!