Page MenuHomePhabricator

Parsoid does not swallow newlines in unclosed list items
Closed, ResolvedPublic

Description

Wikitext like

<li>Foo





<li>Bar

Renders as <ul><li>Foo</li><li>Bar</li></ul> in the PHP parser (the unclosed <li>s are closed, and the newlines are swallowed), but Parsoid renders it as:

<li id="mwAQ">Foo

<p id="mwAg"><br id="mwAw"></p>


<p id="mwBA"><br id="mwBQ"></p></li>
<li id="mwBg">Bar</li>

Apparently template authors expect this newline swallowing behavior and rely on it, see T100921#1354936.

Event Timeline

Catrope raised the priority of this task from to Needs Triage.
Catrope updated the task description. (Show Details)
Catrope added projects: Parsoid, Parsoid-DOM.
Catrope subscribed.

Example page: compare https://www.mediawiki.org/wiki/User:Catrope/Newline_party and https://rest.wikimedia.org/www.mediawiki.org/v1/page/html/User:Catrope%2FNewline_party .

What's also entertaining is that the output is very different if the input contains one fewer newline:

<li>Foo




<li>Bar

produces

<li>Foo

<p><br /></p></li>


<li>Bar</li>

i.e. with a lot more whitespace between the <li>s and less in the first <li>

Catrope renamed this task from Parsoid does now swallow newlines to Parsoid does not swallow newlines in unclosed list items.Jun 11 2015, 12:16 AM
Catrope set Security to None.

I realize this sounds a bit like https://xkcd.com/1172/ , but it breaks rendering of certain ruwiki templates in Flow.

Must be the phase of the moon or something that is surfacing all these newline swallowing issues at the same time, because I pushed a patch to swallow newlines in another context ( https://gerrit.wikimedia.org/r/#/c/216136/ ).

That useless comment apart, I don't understand this fully and have a bunch of questions:

  • Is this PHP parser behavior or Tidy behavior?
  • Can't the template be fixed? I am not so keen on adding a bunch of hacks depending on counting how many newlines there are between unclosed list items. My ideal solution for this is to fix this template unless there is a large set of use cases that demand a solution in Parsoid.
  • Can you point me to the template and how / where it is used?

Although if I stop to think about it, a reasonable behavior in the case of unclosed list items is to include all content till the next unclosed <li> (or a list closing tag). So, this could / should be fixed in Parsoid. Will stop chattering on the phab task late at night. Let me investigate more in the coming days / week and update.

That useless comment apart, I don't understand this fully and have a bunch of questions:

  • Is this PHP parser behavior or Tidy behavior?

I don't know, I haven't tested this on an install with Tidy disabled.

  • Can't the template be fixed? I am not so keen on adding a bunch of hacks depending on counting how many newlines there are between unclosed list items. My ideal solution for this is to fix this template unless there is a large set of use cases that demand a solution in Parsoid.

It's nontrivial to fix the template. Adding </li> to the inner template causes both parsers to interpret the newlines. Avoiding the newlines in the outer template is possible but requires that every line be changed to look like -->{{inner template|[parameters...]|hide=[0 or 1]}}<!-- which would be a pretty laborious change to make.

  • Can you point me to the template and how / where it is used?

Example usage for debugging is in my user sandbox. Example usage in real life is here on a non-Flow talk page and here on an experimental Flow version of that talk page. One of the "inner" templates that generates the unclosed <li>s is here, and one of the "outer" templates (really a Wikipedia namespace page that's transcluded) is here. This outer template (and a few others that are similar in structure) is used by the Актуально template which is the one transcluded on my user sandbox page and on that talk page.

This comment was removed by Catrope.

Although if I stop to think about it, a reasonable behavior in the case of unclosed list items is to include all content till the next unclosed <li> (or a list closing tag). So, this could / should be fixed in Parsoid. Will stop chattering on the phab task late at night. Let me investigate more in the coming days / week and update.

That does appear to be what it's doing, at least in the 5 newlines case. I agree that in the 4 newlines case it's a bit weirder than that.

Arlolra triaged this task as High priority.Jul 7 2015, 3:15 AM
Arlolra raised the priority of this task from High to Needs Triage.
Arlolra moved this task from Needs Triage to In Progress on the Parsoid board.
Arlolra moved this task from In Progress to Needs Triage on the Parsoid board.
Arlolra triaged this task as Medium priority.Jul 20 2015, 5:45 PM
Arlolra subscribed.
ssastry edited projects, added Parsoid-Rendering; removed Parsoid-DOM.
[subbu@earth:~/work/wmf/parsoid] php bin/parse.php --body_only< /tmp/x
<li data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[0,7,4,0]}'>Foo</li>





<li data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[13,20,4,0]}'>Bar</li>