Page MenuHomePhabricator

Mobile Table of Contents double unescapes encoded characters
Closed, ResolvedPublic

Description

Compare these two TOCs:
http://en.wikipedia.beta.wmflabs.org/wiki/Igloo
http://en.m.wikipedia.beta.wmflabs.org/wiki/Igloo (in beta mode)

The desktop one converts & to & (which is correct):
You & your mom & your dog & your shirt

The mobile one converts & to & (which is wrong):
You & your mom & your dog & your shirt


Version: unspecified
Severity: blocker

Details

Reference
bz65042

Event Timeline

bzimport raised the priority of this task from to Unbreak Now!.Nov 22 2014, 3:09 AM
bzimport added a project: MobileFrontend-beta.
bzimport set Reference to bz65042.
kaldari created this task.May 8 2014, 9:31 PM

bingle-admin wrote:

Prioritization and scheduling of this bug is tracked on Trello card https://trello.com/c/IIXEuTOZ

brion added a comment.May 8 2014, 10:24 PM

I think it's these lines in PageApi.js transformSections():

// Store text version so users of API do not have to worry about styling e.g. Table of Contents
// FIXME: [API] should probably do this for us - we want to be able to control the styling of these headings - no inline styles!
section.lineText = $( '<div>' ).html( section.line ).text();

...

$( '<h' + section.level + '>' ).attr( 'id', section.anchor ).html( section.line )

Since this seems to be already plaintext, and not HTML, performing the conversion has the nice side effect of executing any <script> contained in the text, without leaving it in the DOM to be seen.

Hmmm, even if I change that entire each loop to:
$.each( sections, function( i, section ) {

section.lineText = '';
section.children = [];
result.push( section );

} );
It still executes the section title as HTML.

brion added a comment.May 8 2014, 10:45 PM

(In reply to Ryan Kaldari from comment #3)

Hmmm, even if I change that entire each loop to:
$.each( sections, function( i, section ) {

section.lineText = '';
section.children = [];
result.push( section );

} );
It still executes the section title as HTML.

Hmm, I can confirm BUT it only executes it ONCE instead of twice, so it seems to be an improvement. :D

There must be another similar one hiding around somewhere...

If I add...
section.line = '';
...to that loop, it eliminates the last execution. Of course that breaks a few features though.

I have it fixed locally. Will upload diff in a sec...

Created attachment 15321
fix for section unescaping bug

patch attached

Attached:

Created attachment 15322
better patch file

Attached:

mw.sections[].line is defined in ApiMobileView:

https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/a893faf37b78e6/includes/api/ApiMobileView.php#L402-L408

$data['sections'] = $parserOutput->getSections();

Which comes from mediawiki-core Parser:

https://github.com/wikimedia/mediawiki-core/blob/b31c093855be5e0/includes/parser/Parser.php#L4469-L4490

Server-side code uses it as html (the parser claims it has been tripped and escaped where needed, it is considered html, not text).

For example, Linker::tocLine, embeds it raw:

https://github.com/wikimedia/mediawiki-core/blob/b31c093855be5e0/includes/Linker.php#L1657-L1665

return ... '<span class="toctext">' . $tocline . '</span>' ..

Back to the MobileFrontend code, it should be used as text or as html. I think in this case it is supposed to be used as html. If that's unsafe, then we're likely vulnerable in mediawiki-core's html response as well and it should be fixed there instead.

Either way, .html( .. ).text() is a horrible hack that should not be used. If you use mw.html.escape, use it directly and only.

If you find yourself doing:

var fooHtml = mw.html.escape(somethingText);
$something.html(fooHtml);

Please don't. This is extra overhead in both escaping (string manipulation), and parsing (by the browser) for no good reason. Instead just pass the original value to $something.text(), which will instruct the browser to create a native string of text, bypassing the need to escape and parse as html altogether (remember, when in javascript, there is no such thing as HTML, you're interacting with the DOM directly. You wouldn't "append" a string of php syntax to a php array either, instead you'd set a property on the array or use array push, and that is safe. Same goes for text on an element, using .text() goes straight to where it should go without the need to escape or parse.

Created attachment 15360
get rid of lineText completely, fix transformSections

Attached:

Created attachment 15361
again as a patch

Attached: