Page MenuHomePhabricator

Investigation: Browser support for Unicode section IDs and percent-encoded fragments in URLs
Closed, ResolvedPublic3 Story Points

Description

Per T153334, the consensus seems to be that the best way forward is to implement Unicode section IDs and percent-encoded fragments in MediaWiki URLs. For backwards-compatibility with existing saved URLs, we would add empty span tags with the old dot-encoded section IDs.

We already know that this implementation works with all current browsers, but we need to see how far back the support goes. Test percent-encoded fragments and Unicode section IDs (i.e. anchors) on all available browser versions until it is determined which version of each is the oldest to support this implementation. This should allow you to answer the following questions:

  • What is the oldest version of Firefox to support this implementation?
  • Safari?
  • Chrome?
  • Internet Explorer?
  • Mobile Safari?
  • Chrome Mobile?
  • Android Web Browser?

Here is a test page: https://kaldari.github.io/scratchpad/unicode-fragment.html. Browser support should be considered passing if all 3 test cases pass, i.e. all 3 links create a red "selected" arrow next to the proper section ID.

Please use a bifurcated search to efficiently narrow in on the oldest version with support, i.e. if there are 30 available version of Firefox to test on, it should take no more than 5 tests. (Log base 2 of 30 = ~4.9.)

Event Timeline

kaldari created this task.Feb 9 2017, 8:45 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2017, 8:45 PM

BrowserStack will give you a 30 minute free trial. If you need more than that, I think some of the folks in Reading have accounts we might be able to borrow. Maybe @Jdlrobson knows.

BrowserStack will give you a 30 minute free trial. If you need more than that

WMF folks can check https://office.wikimedia.org/wiki/Browser_testing_and_design_tools#BrowserStack for info

kaldari updated the task description. (Show Details)Feb 23 2017, 6:01 PM
kaldari set the point value for this task to 3.
Niharika claimed this task.May 5 2017, 12:08 AM
Niharika moved this task from Ready to In Development on the Community-Tech-Sprint board.
Krinkle removed a subscriber: Krinkle.May 5 2017, 2:31 AM

@Niharika: If it looks like this is going to be really tedious, we can farm it out to The Specialist Guild. We would just need specific instructions for them.

kaldari updated the task description. (Show Details)May 5 2017, 9:13 PM
Niharika removed Niharika as the assignee of this task.May 8 2017, 8:00 PM
Niharika moved this task from In Development to Ready on the Community-Tech-Sprint board.
Niharika added a subscriber: Niharika.

We'll have TSG take this on, per discussion with Ryan.

kaldari updated the task description. (Show Details)May 12 2017, 8:32 PM
kaldari updated the task description. (Show Details)May 12 2017, 8:38 PM
kaldari updated the task description. (Show Details)May 12 2017, 10:00 PM

Here's what I got from testing with Browserstack:

Chrome = 15+ (PASS)
Firefox = 3.6+ (PASS)
Safari = 4+ (PASS)
iOS = 3+ (PASS on iPhone and iPad)
Android = 4+ (PASS for Chrome and Android Browser on most devices, but was not able to test on Galaxy Tab 2 10.1 and Galaxy Note 10.1 and 2)
Opera = 15+ (FAIL as browser support matrix does not include 12.1 to 12.16)
Internet Explorer = links don't work on any IE browsers (FAIL)

Opera = 15+ (FAIL as browser support matrix does not include 12.1 to 12.16)

@Nicholas.tsg: I'm not sure I understand what this means. Can you elaborate? Basically, I just want to know what is the oldest version that passed (that you were able to test).

@kaldari if it is ok that Opera 15 is the oldest version that passes - that nothing below that has to pass - then Opera works.

tstarling added a comment.EditedMay 16 2017, 12:11 AM

Actually navigating to a Unicode ID does work on IE and Edge. But :target is broken for non-ASCII IDs, so you don't get the red "selected". Try https://tstarling.com/stuff/unicode-id.html

Using the document mode emulation feature of IE 11 suggests that unicode IDs work in IE 9 and later. It fails when I set the document mode to IE 5-8. But :target is broken in IE 11 and Edge.

I confirmed that the Unicode ID works in IE 11 (but the :target psuedo-selector fails).

@Nicholas.tsg: Can you test this page with only Internet Explorer and Edge?

Please answer the following questions:

  • What is the oldest version of IE in which the Unicode link works (i.e. clicking the link labeled "Unicode" takes you to "Unicode link works!")?
  • What is the oldest version of Edge in which the Unicode link works?
tstarling closed this task as Resolved.May 20 2017, 8:26 AM
tstarling claimed this task.
tstarling added a subscriber: brion.

@brion has virtual images for all versions of IE back to IE 8. He confirmed that it works in IE 9, as expected, so per our previous discussion, that's all we need to move forward with this.

DannyH moved this task from Estimated to Archive on the Community-Tech board.Jun 6 2017, 9:26 PM

Interestingly, in Internet Explorer (9 – 11), percent-encoded fragments only work correctly if you specify <!DOCTYPE html> in the HTML (which Wikipedia does).

MaxSem added a subscriber: MaxSem.Jul 21 2017, 10:20 PM

Interestingly, in Internet Explorer (9 – 11), percent-encoded fragments only work correctly if you specify <!DOCTYPE html> in the HTML (which Wikipedia does).

In other words, quirks mode breaks the links on every MS browser that has it. Which is worrysome because these browsers might flip it on on a whim.

So we discovered it with https://people.wikimedia.org/~maxsem/bonkers.html (it has a doctype now). This raises the question whether we really want to percent-encode the fragments: in addition to quirks mode (really minor concern these days, but still can unexpectedly flip on for some chthonic reason), it makes fragments unreadable on Chrome (44% of pageviews) and IE/Edge (11%). Problems with raw UTF-8 include difficulties with interpreting %s in sections, however some seemingly smaller problems can also happen when percent-encoded fragments are copypasted into wikilinks.

Another discovery: IE is case-insensitive about all Unicode characters, not just ASCII. That means that strtolower in Parser::formatHeadings() should be replaced with mb_strtolower for HTML5 mode.

Have you considered an opportunity to interact with Chrome developers somehow (at least via their bug tracker) to urge them to implement the decoding of percent-encoded strings in fragments in address bar to improve readability, as it is done in other browsers (at least Firefox, as far as I know)? This affects not only section links, but media viewer URLs (example link), probably some other features too.

@MaxSem: BTW, it looks like those high pageview stats for IE7 were due to a bot (T148461) so we don't need to worry about IE7 at least :P