Page MenuHomePhabricator

ExtLink parse tripped up by a late-stage stripping of newline+category link
Open, NormalPublic

Description

[http://google.com foo
[[Category:Bar]] baz]

The above is not parsed as an extlink by Parsoid because the newline + category pair is recognized later in the pipeline by which time the text has been processed into "[", ExtLink, .., "]" tokens. So, at the same time that the newline + category pair is recognized, the preceding extlink tokens should also be fixed up.

I saw this on one page during visual diff testing ( so far based on investigating test results ). Example mznwiki page. The relevant wikitext on that page is {{coord|44|48|N|20|28|E|type:country|display=title}}

This looks like an edge case, but it is possible other mznwiki pages might be affected by this bug.

Event Timeline

ssastry created this task.Mar 27 2017, 5:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 27 2017, 5:47 PM
Anomie added a subscriber: Anomie.Mar 27 2017, 7:27 PM

Translating it from Mazandarani, the relevant bit of that output looks like [http://blah.example.com ABC\n[[Category:Something]]DEF]. Parser::replaceInternalLinks2() strips out the [[Category:Something]] and the leading newline (per T2087, according to a comment) leaving [http://blah.example.com ABCDEF] for replaceExternalLinks() to see later.

I don't know why copy-pasting it into that sandbox page makes it not do that for you, it works fine for me. Possibly it's some sort of confusion with the RTL in the copy-paste or some sort of unicode breakage.

ssastry renamed this task from Difference in template parse output on mznwiki to ExtLink parse tripped up by a late-stage stripping of newline+category link.Mar 27 2017, 7:47 PM
ssastry triaged this task as Normal priority.
ssastry removed a project: MediaWiki-API.

Translating it from Mazandarani, the relevant bit of that output looks like [http://blah.example.com ABC\n[[Category:Something]]DEF]. Parser::replaceInternalLinks2() strips out the [[Category:Something]] and the leading newline (per T2087, according to a comment) leaving [http://blah.example.com ABCDEF] for replaceExternalLinks() to see later.

Thanks! That helps. Parsoid implements this as well and that part of it is working as expected ... but there is an ordering problem inside Parsoid that prevents the extlink from getting recognized.

I don't know why copy-pasting it into that sandbox page makes it not do that for you, it works fine for me. Possibly it's some sort of confusion with the RTL in the copy-paste or some sort of unicode breakage.

Could be.

ssastry updated the task description. (Show Details)
ssastry removed a subscriber: Anomie.
ssastry updated the task description. (Show Details)Mar 27 2017, 8:14 PM
cscott added a subscriber: cscott.Aug 13 2017, 4:22 PM

T87753 (in the PHP parser) may be related, as whitespace stripping around categories is involved.