Pages with single quote in title are inaccessible by some clients (redirect loop)
Closed, ResolvedPublic

Description

Web browsers

It seems all current browsers (Firefox 39, Chromium 31 and Opera 30) have no issue with our url encoded single quote ('), but Opera 12 falls into a redirect loop and finally displays a blank page. I know Opera 12.16 is a browser which is becoming old, I’m accustomed to.

This is similar to T105265 and probably comes from 155d555b83eca6403e..

In the Opera Developer Tools (Dragonfly):

  1. Opera fetches URL: https://fr.wikipedia.org/wiki/O'Hare_Branch
  2. MediaWiki server redirects to: https://fr.wikipedia.org/wiki/O%27Hare_Branch
  3. Opera enforces the url to have an unencoded single quote and before fetching from network rewrites the url as: https://fr.wikipedia.org/wiki/O'Hare_Branch
  4. Back to #1.

For now I have not searched if the single quote has the same status in RFC 3986 than the tilde of T105265.

Search engines

(merged from T112425)

@eranroz wrote

New pages with apostrophe characters aren't indexed by external web search engines.

Exmples
English Wikipedia examples:

See also

Google Translate

(merged from T122786)

@Schnark wrote:

Web servers

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Krinkle added a comment.EditedSep 5 2015, 8:58 PM

Hm.. that code may no longer be needed now that MediaWiki produces redirects for non-canonical url variations such as encoding.

On the bright side, it seems that code isn't fully solving the problem it's supposed to solve. That Varnish code only decodes certain characters. It doesn't re-encode anything. So urls where the canonical MediaWiki url contains encoded characters, Varnish wasn't ensuring those to be encoded. So there was still room for cache fragmentation and stale cache.

It's the bright side because it means there won't be any risk of a redirect loop. If Varnish were to re-encode e.g. ', then this patch would cause those urls to trigger a virtual redirect loop (where Varnish re-encodes the character, and then MediaWIki serves a decoding redirect etc.)

EDIT: Ah, the code may not be redundant. That varnish code is not just for cache fragmentation and stale cache, it's also to prevent redirect loops such as the one reported in this task. Because some browsers forcefully encode certain characters in the background even if the entered url or followed link has it unencoded.

Change 232758 merged by jenkins-bot:
Do not encode "'" as ' (redirect loop in Opera 12)

https://gerrit.wikimedia.org/r/232758

matmarex closed this task as Resolved.Sep 9 2015, 7:40 PM

I can view pages such as https://www.mediawiki.org/wiki/Manual:MediaWiki_Developer's_Guide or https://www.mediawiki.org/wiki/Manual:Chris_G's_botclasses, so I presume this fixed. @BBlack, should I file a separate bug about the Varnish stuff?

Ankry added a subscriber: Ankry.Sep 10 2015, 1:42 PM
Ankry added a comment.Sep 10 2015, 1:47 PM

It seems it is not. In Firefox 33 and 36 the above pages still reports an incorrect redirect error.

matmarex reopened this task as Open.Sep 10 2015, 3:08 PM

I can't reproduce the error with Firefox 41 (current beta) on Windows 7, or any other modern browser, but I did reproduce it with Firefox 3.6, which still has a usage share on Wikimedia wikis comparable with Opera 12.

I see the loop in Firefox 39 too.

Nemo_bis renamed this task from Redirect loop in Opera 12 when the title contains a single quote to Redirect loop when the title contains a single quote (%27).Sep 10 2015, 3:11 PM
Nemo_bis set Security to None.
Nemo_bis raised the priority of this task from Lowest to Unbreak Now!.Sep 10 2015, 3:13 PM

Change 237401 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237401

Change 237401 merged by jenkins-bot:
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237401

Change 237405 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237405

Change 237405 merged by jenkins-bot:
Revert "Do not encode "'" as ' (redirect loop in Opera 12)"

https://gerrit.wikimedia.org/r/237405

Okay, sorry about this. So now we're back to the original state of Firefox working and Opera 12 not working. Restoring priority and stuff.

matmarex renamed this task from Redirect loop when the title contains a single quote (%27) to Redirect loop in Opera 12 when the title contains a single quote (%27).Sep 10 2015, 4:03 PM
matmarex lowered the priority of this task from Unbreak Now! to Lowest.

Okay, sorry about this. So now we're back to the original state of Firefox working and Opera 12 not working. Restoring priority and stuff.

If we can't come up with a single url that can be enforced, we should probably do what we originally did in Varnish and extend that config by merging these entries at the edge and thus support both variations. Since they appear to be mutually exclusive.

However that would be a Wikimedia-specific solution. It'd be nicer to build in tolerance at the MediaWiki level, so that third parties aren't stuck with inaccessible pages. Either as normal operation mode (e.g. allow both variations, and also purge both).

I'd prefer not to have this in Varnish as it keeps principle of authority on this matter within MediaWiki. And removing those from VCL would reduce complexity. However it does increase Varnish footprint slightly as it would no longer special-case these non-canonical copies.

Speaking of third parties, the other variations currently handled by Varnish presumably are also broken for some browsers viewing third-party wikis because we didn't use to enforce canonical urls by redirect. Third parties without a caching proxy had no issues. And those with were presumably serving stale cache potentially to some clients. But in 1.26 they'd be broken.

matmarex removed matmarex as the assignee of this task.Oct 27 2015, 8:57 PM

I'm not working on this currently. Unless more recent versions of Firefox can deal with unescaped apostrophes in URLs, it looks to me like the only solution would be to revert Krinkle's change rMW155d555b83eca6403e07d2094b074a8ed2f301ae and follow-ups, and I don't think we want to do that.

Krinkle renamed this task from Redirect loop in Opera 12 when the title contains a single quote (%27) to Pages with single quote in title are inaccessible by some clients (redirect loop).Jun 2 2016, 11:25 PM
Krinkle added a project: SEO.
Krinkle updated the task description. (Show Details)
Krinkle raised the priority of this task from Lowest to High.
Krinkle added subscribers: eranroz, Schnark.
Krinkle added a subscriber: StudiesWorld.
Krinkle added subscribers: Matanya, Tzafrir, TomerA and 4 others.
Krinkle added a subscriber: tstarling.
IKhitron added a subscriber: Yurik.Jun 3 2016, 12:05 AM

There has been ongoing discussion about the indexing problem in https://productforums.google.com/d/msg/webmasters/wTPr_r_73sc/KyYsScvOnB4J

Yurik removed a subscriber: Yurik.Jun 26 2016, 10:22 AM

MW-1.27.

Error on browsers:
Vivaldi 1.2.490.43 "ERR_TOO_MANY_REDIRECTS"
Firefox 47.0
Firefox 47.0.1
Internet Explorer 11.0

I'm not sure which team in WMF is responsible for it, but I think Discovery. (If not please move to the correct team)

This is high important task resulting in search engines not indexing many new pages on Wikipedia.

debt added a subscriber: debt.Aug 15 2016, 10:08 PM

Hi @eranroz and @RogueFiber - please post the URL that you're using to test with.

I did a quick test on Chrome, Opera and Firefox and I'm not seeing any errors or weirdness using the O'Hare URLs listed above on en.wiki and fr.wiki.

For example:
https://he.wikipedia.org/wiki/%D7%A1%D7%95%D7%A4%D7%99_%D7%A6%27%D7%A8%D7%A0%D7%99%D7%90%D7%A7

This page isn't indexed by google though newer pages are already indexed.

debt added a comment.EditedAug 15 2016, 10:51 PM

Ah, I see now - Google does this (did you mean) but that is less than optimal.

We don't have a direct relationship with Google to help them index (or do anything else) with Wiki pages, that I know of. I'll send this to our Discovery mailing list and see if anyone can help.

So...interestingly enough. As I was writing up an email about this, I checked again on Google and now - it's working! Maybe we just needed to wait a bit longer as their spiders gather all the new pages?

This comment was removed by RogueFiber.
This comment was removed by RogueFiber.
Krinkle updated the task description. (Show Details)Sep 9 2016, 9:45 PM

Change 309575 had a related patch set uploaded (by Paladox):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/309575

Kghbln added a subscriber: Kghbln.Sep 20 2016, 9:23 PM

I'm not working on this currently. Unless more recent versions of Firefox can deal with unescaped apostrophes in URLs, it looks to me like the only solution would be to revert Krinkle's change rMW155d555b83eca6403e07d2094b074a8ed2f301ae and follow-ups, and I don't think we want to do that.

It appears to me like this is still the only option since it looks to me that nobody involved here has a clue how to fix it. I am stuck at MW 1.25 with not upgrade path due to the issues this change causes.

I would like to update from unsupported MW 1.25. Any news since I still have no upgrade path.

I am stuck at MW 1.25 with not upgrade path due to the issues this change causes.

@Kghbln: Which browsers and browser versions are affected in your case?

Currently it is Firfox 49, Chrome 53, Opera 12.26 (on Linux Mint 17.3) and Edge 25 and Internet Explorer 11 (on Win 10), so basically everywhere.

Okay, per earlier comments, I am merging the revert (and backporting to MW 1.28). I think the detailed task description here is sufficient justification, and proponents of the original change seem unable or unwilling to fix the problems.

If the redirecting functionality is to be reimplemented, it should probably be researched more carefully… (Or, if we at Wikimedia decide that the pain suffered by some users is worth the gain in "cleanliness", it should be gated behind a configuration option.)

Change 309575 merged by jenkins-bot:
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/309575

Change 320559 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320559

Change 320559 merged by Bartosz Dziewoński:
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320559

Kghbln added a comment.Nov 9 2016, 8:51 AM

@matmarex Thanks for dealing with this issue. I think it would also make sense to backport to REL1_27

Change 320795 had a related patch set uploaded (by Bartosz Dziewoński):
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320795

Change 320795 merged by jenkins-bot:
Revert "MediaWiki.php: Redirect non-standard title urls to canonical"

https://gerrit.wikimedia.org/r/320795

Also backported to REL1_27. The change should be included in the upcoming MediaWiki 1.27.2 release, unfortunately I don't know when it's going to be (I don't think it's planned yet).

@matmarex Thanks a lot. Very much appreciated! Time to move on from 1.25 to 1.27 as soon as 1.27.2 is out. :)