Page MenuHomePhabricator

Incorrect treatment of titles starting with a slash when URLs are ultra-short
Open, LowPublic

Description

Under MW 1.26.3 and 1.27.

When canonical URL follow an ultra-short http[s]://(mydomain)/Page_title scheme and a page with the title beginning with a slash (/) is requested, an endless permanent redirect (301) is served. It happens for both http[s]://(mydomain)//etc and http[s]://(mydomain)/w/index.php?title=/etc-type requests for the page titled "/etc".

The metod WebRequest::getGlobalRequestURL() (lines 792-799) "normalises" the URL by removing all leading slashes but one, and then MediaWiki::tryNormaliseRedirect () in lines 344-348 redirects.

Although, in my opinion, the error is in WebRequest::getGlobalRequestURL(), and was there for some time; it was revealed by introducing MediaWiki class in MW 1.26.3.

There is a similar, though less severe, bug T113160. But this one blocks upgrading to MW 1.26.3 or MW 1.27.

Event Timeline

matmarex subscribed.

This configuration is generally not recommended, so I doubt any MediaWiki developers would consider this high priority. We're always happy to accept patches though!

I think this can be worked around by setting $wgArticlePath to a full URL? Like:

$wgArticlePath = 'https://example.com/$1';

I think this can be worked around by setting $wgArticlePath to a full URL? Like:

$wgArticlePath = 'https://example.com/$1';

No, it can't. URLs in incoming HTTP requests are treated incorrectly, not wikilinks on pages.

This configuration is generally not recommended, so I doubt any MediaWiki developers would consider this high priority. We're always happy to accept patches though!

Then my suggestion is simply to replace in lines 792-798 in include/WebRequest.php

		if ( $base[0] == '/' ) {
			// More than one slash will look like it is protocol relative
			return preg_replace( '!^/+!', '/', $base );
		} else {
			// We may get paths with a host prepended; strip it.
			return preg_replace( '!^[^:]+://[^/]+/+!', '/', $base );
		}

with

return preg_replace( '!^[^:]+://[^/]+/+!', '/', $base );

I am not a MediaWiki developer, and a more formal way of submitting patches is not available to me.

@alex-mashin: You are very welcome to use developer access to submit the proposed code changes as a Git branch directly into Gerrit which makes it easier to review them quickly and provide feedback. And if you don't want to set up Git/Gerrit, you could also use the Gerrit Patch Uploader. Thanks again!

I think this can be worked around by setting $wgArticlePath to a full URL? Like:

$wgArticlePath = 'https://example.com/$1';

No, it can't. URLs in incoming HTTP requests are treated incorrectly, not wikilinks on pages.

No, you may be right, @matmarex: $wgArticlePath does have something to do with this issue.

This bug works as I reported it, if the hack described in T113160 is applied. Without the hack, there is another redirect: http[s]://(mydomain)//etc → http[s]://etc, also wrong but not circular.

Change 302088 had a related patch set uploaded (by Alexander I. Mashin):
This patch partially addresses the issue T141444: an incorrect redirect while serving pages with titles starting with '/' if the URL scheme is http[s]://(domain)/Page_title, when a local URL beginning with '//' is treated as a protocol-relative URL.

https://gerrit.wikimedia.org/r/302088

Somehow, I've managed to submit this: https://gerrit.wikimedia.org/r/#/c/302088/. I'm afraid this learning curve is too steep for me.

I often use $wgScript = '' and $wgArticlePath = '/$1' and I generally don’t have problems. As described above, I tried to create the article "/etc"; I managed by using https://[server]/index.php?title=/etc&action=edit and then using https://[server]/index.php?title=/etc&action=view&r to view it (to avoid any redirect).

I tested the patch and it works, in the sense that I can view and edit https://[server]//etc. But I do not understand the logic behind this patch, could you explain why you made such a change? I found this part of WebRequest was introduced in 2005 by @Brion (874fa5ddb) to "strip off the host bits squid prepends".

By studying this part (WebRequest::getGlobalRequestURL, MediaWiki::tryNormaliseRedirect, and wfExpandUrl) the patch only avoid to catch the line $targetUrl != $request->getFullRequestURL() in MediaWiki::tryNormaliseRedirect and being redirected ($targetUrl is evaluated as 'https://etc'), but a complementary resolution would be, I guess, to clearly distinguish in wfExpandUrl the two cases:

  • the URL has no assumption about being local or fully-qualified (current assumption)
  • the URL is known to be local (this is our case here, I didn’t observe any issue with the local part here)

Anyway all these changes should be carefully tested against various servers and configurations to avoid any regression.

You don't understand the logic behind this patch, and I don't fully undestand the logic behind URL processing.

The place that I changed mistook a local URL for protocol-relative one; so I reduced the number of cases whet it happened.

Change 302088 abandoned by Alexander I. Mashin:
This patch partially addresses the issue T141444: an incorrect redirect while serving pages with titles starting with '/' if the URL scheme is http[s]://(domain)/Page_title, when a local URL beginning with '//' is treated as a protocol-relative URL.

Reason:
The change to the code that this patch corrects seems to be rolled back.

https://gerrit.wikimedia.org/r/302088

The workaround is this:

$wgHooks ['GetLocalURL'] [] = function (Title $title, string &$url, string $_): bool {
	if (mb_substr ($title->getText (), 0, 1) === '/'
		&& !strpos ($url, '.', 2)
		&& $title->getNamespace () === 0
		&& !MWNamespace::hasSubpages (0) // -- the same as $title->getNamespace () but faster
	) {
		$url = '/.' . $url;
	}
	return true;
};