Page MenuHomePhabricator

Special characters in URL lead to redirect loop under Apache 2.2.22 on Debian 7
Closed, DuplicatePublic

Description

This bug is probably related or the same as T127734 (cause: 155d555b83eca6403e07d2094b074a8ed2f301ae) though I do not know how to do such an elaborate error analysis.

There is a wiki accessible with the following logic:

"http://example.com/wiki/Main_Page"

When I edit I have:

"http://example.com/w/index.php?title=Main_Page&action=edit"

To get this magic running I have the following RewriteRules:

RewriteEngine On
RewriteRule ^/?wiki(/.*)?$ %{DOCUMENT_ROOT}/w/index.php [L]
RewriteRule ^/?$ %{DOCUMENT_ROOT}/w/index.php [L]

And in "LocalSettings.php":

$wgScriptPath = "/w";
$wgScriptExtension = ".php";
$wgArticlePath = "/wiki/$1";
$wgUsePathInfo = true; // true or false does not make a difference

Everything is cool until I dare to try accessing a page containing a special character, e.g. "http://example.com/wiki/So_ein_Ärger" and end up in an indefinite redirect loop (Fx, Ch, O, and Web/Api). Same happens to special pages containing umlauts.

This wiki is a MW 1.26.2.

Steps taken to mitigate: Downgrade wiki to MW 1.25.5 and everything works. I guess it will be cool to be able to use MW 1.26+. I think it will be important to get a fix into MW 1.27 LTS.

Event Timeline

I created this extra issue since not only special pages but also regular pages are affected and not only Chrome but other browsers, too. If we are talking about the same this can of cause be marked duplicate.

I will try my best if there is something I should investigate to help the cause.

@Kghbln Can you determine what the the redirect loop consists of exactly?

This can be debugged by inspecting the Location http response header when looking at the Network panel in your browser's developer tools (or by requesting a url from your server from the command line with curl).

E.g. does the redirect redirect to itself directly, or does it redirect to an alternate version that then indirectly redirects back to the first variant?

@Krinkle Thanks for your comment. Will do! I am going to re-install on the server and provide feedback. It may take till Friday though.

@Krinkle Thank you for bearing with me. I was just able to install a MW 1.26.2 wiki on the server again and thus recreate the issue.

The request is " http://test.bismilah.de/wiki/Spezial:Letzte_Änderungen " and the response is " http://test.bismilah.de/wiki/Spezial:Letzte_%C3%84nderungen " which leads to a infinite loop.

Note: Script path is now "mw126" instead of "w" but this does not really matter. See the test instance for live examination.

Aklapper triaged this task as Medium priority.Apr 25 2016, 6:15 AM

@Krinkle It is not a common issue, however it is spreading. There was a report on wiki-forum.de two days ago which is basically identical to what I reported here. I asked to post the setup here.

The user over there at wiki-forum.de posted the following as a solution that worked for him:

Apache:

AddDefaultCharset UTF-8

PHP:

default_charset = "UTF-8"

[iconv]
iconv.input_encoding = UTF-8
iconv.internal_encoding = UTF-8
iconv.output_encoding = UTF-8

Since this did not help me I also added

[mbstring]
mbstring.language = utf-8
mbstring.internal_encoding = utf-8
mbstring.http_input = utf-8
mbstring.http_output = utf-8

Nothing: I am still looping endlessly. Most of these settings are depreciated so this is probably a temporary solution anyways.

@Krinkle Now that MW 1.27 is around the corner I do not really have an upgrade path for it except for locally reverting the troublesome change. Is there something else I could probably check and do to get things rolling?

@Kghbln: Can you please give us the contents of $_SERVER when the relevant request is made? Open index.php in a text editor, and after the <?php , add the following lines:

header( 'Content-Type: text/plain; charset=utf-8' );
var_dump( $_SERVER );
exit;

Request the affected page in your browser and copy the result here.

@tstarling Thanks for your post.

Here it is:

array(34) {
  ["SERVER_PORT"]=>
  string(2) "80"
  ["UNIQUE_ID"]=>
  string(24) "V1DBS1lruz0AAFU5I0UAAAAB"
  ["SCRIPT_URL"]=>
  string(32) "/wiki/Spezial:Letzte_Änderungen"
  ["SCRIPT_URI"]=>
  string(55) "http://test.bismilah.de/wiki/Spezial:Letzte_Änderungen"
  ["HTTP_HOST"]=>
  string(16) "test.bismilah.de"
  ["HTTP_X_REAL_IP"]=>
  string(12) "91.65.247.81"
  ["HTTP_X_FORWARDED_FOR"]=>
  string(12) "91.65.247.81"
  ["HTTP_CONNECTION"]=>
  string(5) "close"
  ["HTTP_USER_AGENT"]=>
  string(76) "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0"
  ["HTTP_ACCEPT"]=>
  string(63) "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
  ["HTTP_ACCEPT_LANGUAGE"]=>
  string(14) "en-US,en;q=0.5"
  ["HTTP_ACCEPT_ENCODING"]=>
  string(13) "gzip, deflate"
  ["HTTP_DNT"]=>
  string(1) "1"
  ["PATH"]=>
  string(28) "/usr/local/bin:/usr/bin:/bin"
  ["SERVER_SIGNATURE"]=>
  string(0) ""
  ["SERVER_SOFTWARE"]=>
  string(6) "Apache"
  ["SERVER_NAME"]=>
  string(16) "test.bismilah.de"
  ["SERVER_ADDR"]=>
  string(13) "89.107.187.61"
  ["REMOTE_ADDR"]=>
  string(12) "91.65.247.81"
  ["DOCUMENT_ROOT"]=>
  string(12) "/home/www/mw"
  ["SERVER_ADMIN"]=>
  string(26) "webmaster@test.bismilah.de"
  ["SCRIPT_FILENAME"]=>
  string(29) "/home/www/mw/mw1262/index.php"
  ["REMOTE_PORT"]=>
  string(5) "47179"
  ["GATEWAY_INTERFACE"]=>
  string(7) "CGI/1.1"
  ["SERVER_PROTOCOL"]=>
  string(8) "HTTP/1.0"
  ["REQUEST_METHOD"]=>
  string(3) "GET"
  ["QUERY_STRING"]=>
  string(0) ""
  ["REQUEST_URI"]=>
  string(36) "/wiki/Spezial:Letzte_%c3%84nderungen"
  ["SCRIPT_NAME"]=>
  string(32) "/wiki/Spezial:Letzte_Änderungen"
  ["PHP_SELF"]=>
  string(32) "/wiki/Spezial:Letzte_Änderungen"
  ["REQUEST_TIME_FLOAT"]=>
  float(1464910155.109)
  ["REQUEST_TIME"]=>
  int(1464910155)
  ["argv"]=>
  array(0) {
  }
  ["argc"]=>
  int(0)
}

Hopefully it will be useful. I reverted back the change to allow seeing the issue (redirect loop) live.

In REQUEST_URI you have %c3 instead of %C3. Assuming your browser sent %C3, something must have decoded it to unicode and re-encoded it back to lower-case hex.

Well I am sure that we can rule out the browsers as the source of the problem. It does not work with Firefox, Chrome, Opera and Webbrowser on Mint 17 and not with Firefox on Win10 (have not tested other browsers there). Up to MW 1.25 everything is ok. MW 1.26 starts to be a pain, rarely but not just for me. I am not sure what can be done about this. So if there is a suggestion I am interested to try it.

In REQUEST_URI you have %c3 instead of %C3. Assuming your browser sent %C3, something must have decoded it to unicode and re-encoded it back to lower-case hex.

Yes, that is the problem. I debugged a litte and found the problem in tryNormaliseRedirect in include/MediaWiki.php:

// Redirect to canonical url, make it a 301 to allow caching
$targetUrl = wfExpandUrl( $title->getFullURL(), PROTO_CURRENT );

if ( $targetUrl != $request->getFullRequestURL() ) {
    $output->setCdnMaxage( 1200 );
    $output->redirect( $targetUrl, '301' );
    return true;
}

This is the code that triggers the redirect. Having a closer look at the comparison:
$targetUrl is https://example.com/wiki/Spezial:Abfrage_ausf%C3%BChren
$request->getFullRequestURL() is in my case: https://example.com/wiki/Spezial:Abfrage_ausf%c3%bchren

The getFullRequestURL function seems to take the URL from the GET request whereas $targetUrl gets its value from getFullURL. During getFullURL wfUrlencode gets called calling PHP's urlencode leading to the uppercase HEX characters.

TLDR: seems that the urlencoded URL from the webserver gets compared with a urldecoded and afterwards urlencoded version leading to a string difference (%C3%BC != %c3%bc) that triggers the redirect ultimatively.

Short addition:
RFC 3986 says that the HEX chars are case-insensitive. A string comparison might not be the right thing to do here.

Here's a pretty dirty hotfix you probably shouldn't use:

Add this in includes/GlobalFunctions.php in wfUrlencode before return $s;

$s =  preg_replace_callback(
    '/%[0-9A-F]{2}/',
    function ($match) {
        return strtolower($match[0]);
    },
    $s
);

@Basti2342 Thanks for your findings and further elaborating the issue. Prior to this I figured that I was the only weirdo with some borked Apache install.

@Krinkle @tstarling Perhaps it is worth looking at this again in the light of the new findings. I already have a couple of wikis rotting at 1.25 with not upgrade path. Affected branches are REL1_26 onwards. Thanks for your time!

Basti2342 raised the priority of this task from Medium to High.EditedAug 13 2016, 10:15 AM

I think this is a major bug breaking Mediawikis since 1.26. This is a problem especially for non-English wikis containing special characters in article titles. It would be nice if this gets more attention.

Basti2342 raised the priority of this task from High to Unbreak Now!.Aug 13 2016, 10:21 AM
Aklapper lowered the priority of this task from Unbreak Now! to Medium.Aug 13 2016, 12:31 PM

@Basti2342 : You increased the priority of this task.
As priority reflects reality and does not cause it, could you please elaborate why this task has suddenly become more urgent? If you increased priority because you plan to work on this task (thanks!) please claim the task by setting yourself as assignee. Thank you for your help!

Kghbln raised the priority of this task from Medium to High.Aug 13 2016, 12:40 PM

@Aklapper This task is full of elaboration to why it should be fixed either by reverting the commit causing this or by enhancing it somehow to unbreak things. It is indeed suddenly more urgent since the MW 1.25 branch is no longer supported and downgrading to the still supported MW 1.23 branch seems to approach this the wrong way.

@Aklapper To at least get the attention of somebody who is concerned with URL generation could you please advise about a programmer or team dealing with such issues. Also having this in "General or unknown" it probably part of why this is not tackled so assigning this to a more appropriate component will probably help here, too. I do not really know which one this could be though.

(Thanks for the latest comments! Very valid points.)

@Krinkle (according to git blame you seem to have written most of the current code): Could you take a look at T131414#2530593 and T131414#2530652 please?
Thanks in advance!

(Thanks for the latest comments! Very valid points.)

@Krinkle (according to git blame you seem to have written most of the current code): Could you take a look at T131414#2530593 and T131414#2530652 please?
Thanks in advance!

@Aklapper No worries. Thanks for your assistance. Now it is probably just a matter of waiting for merciful help. :) Keeping fingers crossed.

It will be nice to have a fix in an upcoming version of MediaWiki. I still have no upgrade path for the wikis using 1.25.6 which become increasingly outdated now that new version of MediaWiki were released.

Any chances here? Should I author a revert of the transgressing code change to help the cause? Since I still have no upgrade path.

@Kghbln Hi you can if you want. But @Krinkle do you know if you could fix this please?

@Paladox Yeah, but I wanted to know if it makes sense or if a better solution is around. The issue has been tracked down to its cause so my hopes for a fix rose.

Change 309575 had a related patch set uploaded (by Paladox):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/309575

@Krinkle Thank you for bearing with me. I was just able to install a MW 1.26.2 wiki on the server again and thus recreate the issue.

The request is " http://test.bismilah.de/wiki/Spezial:Letzte_Änderungen " and the response is " http://test.bismilah.de/wiki/Spezial:Letzte_%C3%84nderungen " which leads to a infinite loop.

Note: Script path is now "mw126" instead of "w" but this does not really matter. See the test instance for live examination.

I had to update the location of the wiki: http://test.ahmedalhassan.wiki/wiki/Spezial:Letzte_Änderungen

Currently it is not working with Firfox 49, Chrome 53, Opera 12.26 (on Linux Mint 17.3) and Edge 25 and Internet Explorer 11 (on Win 10), so basically everywhere.

Since I do not have an upgrade path:@Aklapper Do you happen to know another person who may have a peep at this. Thanks!