Page MenuHomePhabricator

Pages with single quote in title are inaccessible by some clients (redirect loop)
Closed, ResolvedPublic

Description

Web browsers

It seems all current browsers (Firefox 39, Chromium 31 and Opera 30) have no issue with our url encoded single quote ('), but Opera 12 falls into a redirect loop and finally displays a blank page. I know Opera 12.16 is a browser which is becoming old, I’m accustomed to.

This is similar to T105265 and probably comes from 155d555b83eca6403e..

In the Opera Developer Tools (Dragonfly):

  1. Opera fetches URL: https://fr.wikipedia.org/wiki/O'Hare_Branch
  2. MediaWiki server redirects to: https://fr.wikipedia.org/wiki/O%27Hare_Branch
  3. Opera enforces the url to have an unencoded single quote and before fetching from network rewrites the url as: https://fr.wikipedia.org/wiki/O'Hare_Branch
  4. Back to #1.

For now I have not searched if the single quote has the same status in RFC 3986 than the tilde of T105265.

Search engines

(merged from T112425)

@eranroz wrote

New pages with apostrophe characters aren't indexed by external web search engines.

Exmples
English Wikipedia examples:

Hebrew Wikipedia examples:

See also

Google Translate

(merged from T122786)

@Schnark wrote:

Web servers

Related Objects

Mentioned In
T138093: Investigate query parameter normalization for MW/services
T276173: Unable to share link to article via Apple Message if title ends with exclamation mark
T257966: URL shall be terminated by %2E or _ if page name ends with dot
T120085: RFC: Serve Main Page of Wikimedia wikis from a consistent URL
T144100: Pageview dumps incorrectly formatted, need to escape special characters
T67402: URLs for the same title without extra query parameters should have the same canonical link
T122786: Pages with apostrophe character in their title can't be translated by Google Translate
T116986: Set up UrlShortener dumps
T104755: Move URL-routing logic into MediaWiki
T112425: Pages with apostrophe character in their title aren't indexed by external search engines
T111933: Certain edit links on de.wp redirect to Main Page due to "NoScript"
T112069: Instability on fr.wikiversity project
rMW7b3d7154fe4b: Revert "Do not encode "'" as %27 (redirect loop in Opera 12)"
rMW8477b1b277c0: Revert "Do not encode "'" as %27 (redirect loop in Opera 12)"
rMWa89a21990e9d: Do not encode "'" as %27 (redirect loop in Opera 12)
Mentioned Here
T127734: Special characters in URL lead to redirect loop under IIS 7.5
T131414: Special characters in URL lead to redirect loop under Apache 2.2.22 on Debian 7
T112425: Pages with apostrophe character in their title aren't indexed by external search engines
T122786: Pages with apostrophe character in their title can't be translated by Google Translate
rMW155d555b83ec: MediaWiki.php: Redirect non-standard title urls to canonical
T105265: Redirecting ~ -> %7E causes a redirect loop in chrome

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I can view pages such as https://www.mediawiki.org/wiki/Manual:MediaWiki_Developer's_Guide or https://www.mediawiki.org/wiki/Manual:Chris_G's_botclasses, so I presume this fixed. @BBlack, should I file a separate bug about the Varnish stuff?

It seems it is not. In Firefox 33 and 36 the above pages still reports an incorrect redirect error.

I can't reproduce the error with Firefox 41 (current beta) on Windows 7, or any other modern browser, but I did reproduce it with Firefox 3.6, which still has a usage share on Wikimedia wikis comparable with Opera 12.

pasted_file (1×1 px, 91 KB)

Nemo_bis renamed this task from Redirect loop in Opera 12 when the title contains a single quote to Redirect loop when the title contains a single quote (%27).Sep 10 2015, 3:11 PM
Nemo_bis set Security to None.
Nemo_bis raised the priority of this task from Lowest to Unbreak Now!.Sep 10 2015, 3:13 PM

Change 237401 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237401

Change 237401 merged by jenkins-bot:
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237401

Change 237405 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237405

Change 237405 merged by jenkins-bot:
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237405

Okay, sorry about this. So now we're back to the original state of Firefox working and Opera 12 not working. Restoring priority and stuff.

matmarex renamed this task from Redirect loop when the title contains a single quote (%27) to Redirect loop in Opera 12 when the title contains a single quote (%27).Sep 10 2015, 4:03 PM
matmarex lowered the priority of this task from Unbreak Now! to Lowest.

Okay, sorry about this. So now we're back to the original state of Firefox working and Opera 12 not working. Restoring priority and stuff.

If we can't come up with a single url that can be enforced, we should probably do what we originally did in Varnish and extend that config by merging these entries at the edge and thus support both variations. Since they appear to be mutually exclusive.

However that would be a Wikimedia-specific solution. It'd be nicer to build in tolerance at the MediaWiki level, so that third parties aren't stuck with inaccessible pages. Either as normal operation mode (e.g. allow both variations, and also purge both).

I'd prefer not to have this in Varnish as it keeps principle of authority on this matter within MediaWiki. And removing those from VCL would reduce complexity. However it does increase Varnish footprint slightly as it would no longer special-case these non-canonical copies.

Speaking of third parties, the other variations currently handled by Varnish presumably are also broken for some browsers viewing third-party wikis because we didn't use to enforce canonical urls by redirect. Third parties without a caching proxy had no issues. And those with were presumably serving stale cache potentially to some clients. But in 1.26 they'd be broken.

I'm not working on this currently. Unless more recent versions of Firefox can deal with unescaped apostrophes in URLs, it looks to me like the only solution would be to revert Krinkle's change rMW155d555b83eca6403e07d2094b074a8ed2f301ae and follow-ups, and I don't think we want to do that.

Krinkle renamed this task from Redirect loop in Opera 12 when the title contains a single quote (%27) to Pages with single quote in title are inaccessible by some clients (redirect loop).Jun 2 2016, 11:25 PM
Krinkle raised the priority of this task from Lowest to High.
Krinkle added a project: SEO.
Krinkle updated the task description. (Show Details)
Krinkle added subscribers: eranroz, Schnark.
Krinkle added a subscriber: StudiesWorld.
Krinkle added subscribers: Matanya, Tzafrir, TomerA and 4 others.
Krinkle added a subscriber: tstarling.

MW-1.27.

Error on browsers:
Vivaldi 1.2.490.43 "ERR_TOO_MANY_REDIRECTS"
Firefox 47.0
Firefox 47.0.1
Internet Explorer 11.0

I'm not sure which team in WMF is responsible for it, but I think Discovery. (If not please move to the correct team)

This is high important task resulting in search engines not indexing many new pages on Wikipedia.

Hi @eranroz and @RogueFiber - please post the URL that you're using to test with.

I did a quick test on Chrome, Opera and Firefox and I'm not seeing any errors or weirdness using the O'Hare URLs listed above on en.wiki and fr.wiki.

Ah, I see now - Google does this (did you mean) but that is less than optimal.

google-indexing_issue--סופי_צ%27רניאק.png (363×827 px, 59 KB)

We don't have a direct relationship with Google to help them index (or do anything else) with Wiki pages, that I know of. I'll send this to our Discovery mailing list and see if anyone can help.

So...interestingly enough. As I was writing up an email about this, I checked again on Google and now - it's working! Maybe we just needed to wait a bit longer as their spiders gather all the new pages?

google-indexing_issue--resolved.png (722×790 px, 227 KB)

This comment was removed by RogueFiber.
This comment was removed by RogueFiber.

Change 309575 had a related patch set uploaded (by Paladox):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/309575

I'm not working on this currently. Unless more recent versions of Firefox can deal with unescaped apostrophes in URLs, it looks to me like the only solution would be to revert Krinkle's change rMW155d555b83eca6403e07d2094b074a8ed2f301ae and follow-ups, and I don't think we want to do that.

It appears to me like this is still the only option since it looks to me that nobody involved here has a clue how to fix it. I am stuck at MW 1.25 with not upgrade path due to the issues this change causes.

I would like to update from unsupported MW 1.25. Any news since I still have no upgrade path.

I am stuck at MW 1.25 with not upgrade path due to the issues this change causes.

@Kghbln: Which browsers and browser versions are affected in your case?

Currently it is Firfox 49, Chrome 53, Opera 12.26 (on Linux Mint 17.3) and Edge 25 and Internet Explorer 11 (on Win 10), so basically everywhere.

Okay, per earlier comments, I am merging the revert (and backporting to MW 1.28). I think the detailed task description here is sufficient justification, and proponents of the original change seem unable or unwilling to fix the problems.

If the redirecting functionality is to be reimplemented, it should probably be researched more carefully… (Or, if we at Wikimedia decide that the pain suffered by some users is worth the gain in "cleanliness", it should be gated behind a configuration option.)

Change 309575 merged by jenkins-bot:
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/309575

Change 320559 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320559

Change 320559 merged by Bartosz Dziewoński:
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320559

@matmarex Thanks for dealing with this issue. I think it would also make sense to backport to REL1_27

Change 320795 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320795

Change 320795 merged by jenkins-bot:
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320795

Also backported to REL1_27. The change should be included in the upcoming MediaWiki 1.27.2 release, unfortunately I don't know when it's going to be (I don't think it's planned yet).

@matmarex Thanks a lot. Very much appreciated! Time to move on from 1.25 to 1.27 as soon as 1.27.2 is out. :)