Page MenuHomePhabricator

Expose action parameters in JavaScript (ViewAction diff, SpecialContributions target etc.) regardless of GET, POST or url encoding
Closed, DeclinedPublic

Description

MediaWiki supports ISO 8859-1 and UTF-8 URL encoding:

ISO 8859-1:
http://de.wikipedia.org/w/index.php?title=%D6sterreich&action=history

UTF-8:
http://de.wikipedia.org/w/index.php?title=%C3%96sterreich&action=history

On ISO 8859-1 URL encoding
mw.util.getParamValue('title')
stops on decodeURIComponent('%D6sterreich') with
URIError: malformed URI sequence

Possible solutions:

  • mw.util.getParamValue() tries to decode as ISO 8859-1 when UTF-8 decoding fails.
  • MediaWiki answers with 301 Moved Permanently and Location in UTF-8 encoding.

Version: unspecified
Severity: normal

Details

Reference
bz31918
TitleReferenceAuthorSource BranchDest Branch
Add fallback values for entityType and riskTyperepos/mediawiki/services/ipoid!13stranadd-property-fallbacksmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:52 PM
bzimport set Reference to bz31918.
bzimport added a subscriber: Unknown Object (MLST).

There are several possible fallback decodings depending on the configured site language; replicating that in the JS code would be some annoying extra effort.

Redirecting on a GET request would not be untoward, though we could also see it on some POST requests, where that's less reliable... in general though it's a bit flaky to be manually trying to read things out of the query string that JS sees, as various things might not be there *at all* -- if they come in via rewrite rules, PATH_INFO (eg /wiki/Foo) or POST data.

Your example URL is a good example of one that could easily be switched to using a different URL style that would not have a 'title' query string parameter at all; if $wgActionPaths is set up for history you might have eg http://de.wikipedia.org/history/%C3%96sterreich.

If you're *on* that page, you should be getting the title via mw.config.get('wgTitle') and should make no assumptions about query string parameters.

If you're somewhere else and trying to decode some foreign URL that's been provided to you, then it's probably a bit tricky to try to make any claims about it.

Of course mw.util.getParamValue('title') can be substituted by mw.config.get('wgTitle') or mw.config.get('wgPageName').

On
http://de.wikipedia.org/w/index.php?title=Spezial%3ABeitr%E4ge&contribs=user&target=%D6sterreicher
there is no suitable substitute for mw.util.getParamValue('target').

I don't think mw.util.getParamValue('target') will help you at http://de.wikipedia.org/w/index.php?title=Spezial%3ABeitr%E4ge/%D6sterreicher or http://de.wikipedia.org/wiki/Spezial%3ABeitr%E4ge/%D6sterreicher however.

Possibly indicates that special pages should be exporting their parameters in a cleaner way.

Yes, at the moment there are a lots of hacks to extract the parameters from the URL like extractLemma() in https://de.wikipedia.org/wiki/Benutzer:PDD/helperFunctions.js. mw.util.getParamValue() does not solve the problems and has the same problem with ISO 8859-1 encoding.

Maybe it is possible allocate a JavaScript object with the normalized URL parameters for each (special) page.

(In reply to comment #4)

Yes, at the moment there are a lots of hacks to extract the parameters from the
URL like extractLemma() in
https://de.wikipedia.org/wiki/Benutzer:PDD/helperFunctions.js.
mw.util.getParamValue() does not solve the problems and has the same problem
with ISO 8859-1 encoding.

Maybe it is possible allocate a JavaScript object with the normalized URL
parameters for each (special) page.

+1. I've seen similar problems on Portuguese Wikipedia.

(In reply to comment #0)

On ISO 8859-1 URL encoding
mw.util.getParamValue('title')
stops on decodeURIComponent('%D6sterreich') with
URIError: malformed URI sequence

BTW: Any chance of this being related to bug 25846?

I see three possibilities to solve the problem

  • Implement in JavaScript the same decoder like in PHP. So mw.util.getParamValue() tries to decode as ISO 8859-1 when UTF-8 decoding

fails.

  • MediaWiki answers with 301 Moved Permanently and Location in UTF-8 encoding. So the URL gets normalized.
  • The parameters gets normalized in PHP and transfered to JavaScript via wiki global variable. mw.util.getParamValue() use the normalized parameters from this variable instead of the URL.

(In reply to comment #8)

I see three possibilities to solve the problem

  • Implement in JavaScript the same decoder like in PHP. So

mw.util.getParamValue() tries to decode as ISO 8859-1 when UTF-8 decoding
fails.

There are more encodings, and this can depend on the configuration of the server as well.

  • MediaWiki answers with 301 Moved Permanently and Location in UTF-8 encoding.

So the URL gets normalized.

Not desired or possible in certain cases.

  • The parameters gets normalized in PHP and transfered to JavaScript via wiki

global variable. mw.util.getParamValue() use the normalized parameters from
this variable instead of the URL.

Unnecessary bloat.

And all these 3 solutions share the same problem: They encourage usage of query created and targeted for one script – outside that script (namely a gadget or something). Which is bad, because these are not documented or considered stable. Query parameter names and meaning may change at any time, and must only be used for communication between the script's output and input to itself.

Solution 4.: Relevant values are exported to javascript in a canonical way by the script. For example MediaWiki itself always exports wgTitle, wgNamespaceNumber etc. And special pages export wgCanonicalSpecialPageName. Other scripts can export their own information (e.g. SpecialContributions could export spContributionsTarget: .. or spContributions: { target: .. }.).

Solution 4 is good. This means that mw.util.getParamValue() should never used to get parameters from the URL. All parameters must be parsed on server side. When all parameters are available for JavaScript then this bug can closed with WONTFIX.

The printable version activated by the URL parameter printable=yes should also exposed to JavaScript and to CSS.

Can the support of ISO 8859-1 URL encoding dropped?