Page MenuHomePhabricator

URL with % in fragment breaks scripts that use mw.Uri
Closed, DeclinedPublic

Description

  1. Go to https://en.wikipedia.org/wiki/Wikipedia#% (or any other page with something in the URL fragment that would be invalid as %-encoding).
  2. Try to use Echo or VE.

Instead of step 1 you can also go to a page that has a % sign in a headline, use the TOC to go to that section and reload the page.

Expected: Everything should work as normal.
Actual: Echo and VE will only work via fallback (i.e. Echo will open Special:Notifications, VE will reload the page for editing).
The console is full with errors like URIError: "malformed URI sequence" and TypeError: "defaultUri is undefined", so probably mw.Uri can't parse the URL, and breaks everything that depends on it.

I haven't checked the specs whether a raw % sign is actually allowed in the fragment, but even if it isn't it shouldn't break anything; and if it isn't the parser should not produce such URLs in the TOC.

Event Timeline

Most things categorised under T106244 are invalid under the spec, and never produced by the Parser for the TOC or otherwise.

The bare % case is interesting and afaik not reported previously. The anchor for == % == and and [[#%]] indeed uses plain % currently which suggests that it is valid. I'm not sure why that is rejected currently. Either way, the fix is potentially the same, if resourced, which is to migrate to native URL.

For now though, this should remain a separate task since it isn't a case of garbage-in/garbage-out. Either we need to generate different anchors like we do for other special chars, or we need to support them in mw.Uri.

ovasileva triaged this task as Medium priority.Mar 2 2021, 10:42 AM
ovasileva moved this task from Incoming to Triaged but Future on the Web-Team-Backlog board.

That URL is unambiguously invalid. The URL spec restricts valid characters in the fragment to !$&'()*+,-./:;=?@_~, percent-endcoded and (most) non-ASCII.

DiscussionTools also doesn't work on pages with % in the fragment.

Just want to emphasize (if it isn't clear) that bare % fragments imply just any sections that have % in the name, like More than 50% of Wikipedia visits come from mobile devices. It accounts for at least 900 "Talk" namespace pages (regexp search timed out) in English Wikipedia. All of which don't work correctly if you open them by a section link.

Change 771445 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] URL: Update 'url' polyfill to latest version containing our fixes

https://gerrit.wikimedia.org/r/771445

Change 771445 merged by jenkins-bot:

[mediawiki/core@master] web2017-polyfills: Update URL polyfill to latest version with our fixes

https://gerrit.wikimedia.org/r/771445

Krinkle renamed this task from URL with % in fragment breaks many scripts to URL with % in fragment breaks scripts that use mw.Uri.Mar 27 2022, 10:21 PM
Krinkle removed a project: MediaWiki-Parser.

The root cause is that mw.Uri only supports valid UTF-8 percent encoding in query parameters and hash fragments. Calling it with something else fails. This isn't something we can trivially change without breaking compatibility.

There are also various genuine valid URLs that MediaWiki supports that are not meant to decode as UTF-8 but as a different charset.

Making the problem go away from a central point thus isn't possible, and keeping this as a tracking task is not useful I think.

Individual gadgets and features that do this can either try-catch, as a number of extensions like VisualEditor have done already, or can migrate to the native URL constructor T103379, by depending on web2017-polyfills.

https://developer.mozilla.org/en-US/docs/Web/API/URL

/* https://en.wikipedia.org/wiki/Sandbox?action=edit#foo */

var uri = new mw.Uri(location.href);
uri.fragment; // "foo"
uri.query.action; // "edit"

var uri = new URL(location.href);
uri.hash; // "#foo"
uri.searchParams.get('action'); "edit"