Page MenuHomePhabricator

Incorrect treatment of titles starting with a slash when URLs are ultra-short
Open, LowPublic

Description

Under MW 1.26.3 and 1.27.

When canonical URL follow an ultra-short http[s]://(mydomain)/Page_title scheme and a page with the title beginning with a slash (/) is requested, an endless permanent redirect (301) is served. It happens for both http[s]://(mydomain)//etc and http[s]://(mydomain)/w/index.php?title=/etc-type requests for the page titled "/etc".

The metod WebRequest::getGlobalRequestURL() (lines 792-799) "normalises" the URL by removing all leading slashes but one, and then MediaWiki::tryNormaliseRedirect () in lines 344-348 redirects.

Although, in my opinion, the error is in WebRequest::getGlobalRequestURL(), and was there for some time; it was revealed by introducing MediaWiki class in MW 1.26.3.

There is a similar, though less severe, bug T113160. But this one blocks upgrading to MW 1.26.3 or MW 1.27.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 27 2016, 3:09 PM
alex-mashin updated the task description. (Show Details)Jul 28 2016, 4:45 AM
matmarex triaged this task as Low priority.Jul 30 2016, 6:29 PM
matmarex added a subscriber: matmarex.

This configuration is generally not recommended, so I doubt any MediaWiki developers would consider this high priority. We're always happy to accept patches though!

I think this can be worked around by setting $wgArticlePath to a full URL? Like:

$wgArticlePath = 'https://example.com/$1';

I think this can be worked around by setting $wgArticlePath to a full URL? Like:

$wgArticlePath = 'https://example.com/$1';

No, it can't. URLs in incoming HTTP requests are treated incorrectly, not wikilinks on pages.

This configuration is generally not recommended, so I doubt any MediaWiki developers would consider this high priority. We're always happy to accept patches though!

Then my suggestion is simply to replace in lines 792-798 in include/WebRequest.php

		if ( $base[0] == '/' ) {
			// More than one slash will look like it is protocol relative
			return preg_replace( '!^/+!', '/', $base );
		} else {
			// We may get paths with a host prepended; strip it.
			return preg_replace( '!^[^:]+://[^/]+/+!', '/', $base );
		}

with

return preg_replace( '!^[^:]+://[^/]+/+!', '/', $base );

I am not a MediaWiki developer, and a more formal way of submitting patches is not available to me.

@alex-mashin: You are very welcome to use developer access to submit the proposed code changes as a Git branch directly into Gerrit which makes it easier to review them quickly and provide feedback. And if you don't want to set up Git/Gerrit, you could also use the Gerrit Patch Uploader. Thanks again!

I think this can be worked around by setting $wgArticlePath to a full URL? Like:

$wgArticlePath = 'https://example.com/$1';

No, it can't. URLs in incoming HTTP requests are treated incorrectly, not wikilinks on pages.

No, you may be right, @matmarex: $wgArticlePath does have something to do with this issue.

This bug works as I reported it, if the hack described in T113160 is applied. Without the hack, there is another redirect: http[s]://(mydomain)//etc → http[s]://etc, also wrong but not circular.

Change 302088 had a related patch set uploaded (by Alexander I. Mashin):
This patch partially addresses the issue T141444: an incorrect redirect while serving pages with titles starting with '/' if the URL scheme is http[s]://(domain)/Page_title, when a local URL beginning with '//' is treated as a protocol-relative URL.

https://gerrit.wikimedia.org/r/302088

Somehow, I've managed to submit this: https://gerrit.wikimedia.org/r/#/c/302088/. I'm afraid this learning curve is too steep for me.

I often use $wgScript = '' and $wgArticlePath = '/$1' and I generally don’t have problems. As described above, I tried to create the article "/etc"; I managed by using https://[server]/index.php?title=/etc&action=edit and then using https://[server]/index.php?title=/etc&action=view&r to view it (to avoid any redirect).

I tested the patch and it works, in the sense that I can view and edit https://[server]//etc. But I do not understand the logic behind this patch, could you explain why you made such a change? I found this part of WebRequest was introduced in 2005 by @brion (874fa5ddb) to "strip off the host bits squid prepends".

By studying this part (WebRequest::getGlobalRequestURL, MediaWiki::tryNormaliseRedirect, and wfExpandUrl) the patch only avoid to catch the line $targetUrl != $request->getFullRequestURL() in MediaWiki::tryNormaliseRedirect and being redirected ($targetUrl is evaluated as 'https://etc'), but a complementary resolution would be, I guess, to clearly distinguish in wfExpandUrl the two cases:

  • the URL has no assumption about being local or fully-qualified (current assumption)
  • the URL is known to be local (this is our case here, I didn’t observe any issue with the local part here)

Anyway all these changes should be carefully tested against various servers and configurations to avoid any regression.

alex-mashin added a comment.EditedAug 13 2016, 5:40 AM

You don't understand the logic behind this patch, and I don't fully undestand the logic behind URL processing.

The place that I changed mistook a local URL for protocol-relative one; so I reduced the number of cases whet it happened.

Change 302088 abandoned by Alexander I. Mashin:
This patch partially addresses the issue T141444: an incorrect redirect while serving pages with titles starting with '/' if the URL scheme is http[s]://(domain)/Page_title, when a local URL beginning with '//' is treated as a protocol-relative URL.

Reason:
The change to the code that this patch corrects seems to be rolled back.

https://gerrit.wikimedia.org/r/302088