Page MenuHomePhabricator

Links with escaped linefeed characters mess up indenting
Closed, ResolvedPublic

Description

Author: bulk-wikipedia

Description:

Notice that the second bullet is wrongfully indented.


Version: unspecified
Severity: minor

Details

Reference
bz752

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 7:02 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz752.
bzimport added a subscriber: Unknown Object (MLST).

rowan.collins wrote:

*** Bug 1114 has been marked as a duplicate of this bug. ***

rowan.collins wrote:

My comments from dupe bug 1114:

http://en.wikipedia.org/wiki/Talk:Mozilla_Firefox#Firefox_more_multilingual_than_IE.3F__Opera.3F.3F
As you should be able to see, everything below the section of discussion
referenced, including other headers, is indented as though it were prefixed with
"::". It turns out that there is a URL referenced there which ends in "%0a"
(i.e. an escaped linefeed), and this is somehow breaking the parser;
specifically, the indentation of that particular line is never closed, so
everything below it is indented to the same degree.

I've constructed a simplified test-case at http://test.wikipedia.org/wiki/LFURL,
and it seems that what's happening is that the "%0a" is being unescaped in the
'title' attribute, and then the closing "</dd></dl>" added *to that attribute*.
Presumably, the parser is spotting what is now a plain newline and taking it to
be the end of the indentation.

I'm guessing there's code somewhere that unescapes such escape sequences to make
URL title attributes more readable [the href attribute works fine]; perhaps this
needs to be confined to only unescape printable characters?

0x00-0x1f are now converted to spaces.