Page MenuHomePhabricator

Wrong squashing of blank lines when category link is "removed" from wikitext
Open, MediumPublic

Description

Consider the following snippet:

First paragraph

[[category:foo]]

Second paragraph

It is parsed as

<p>First paragraph </p>
<p>Second paragraph </p>

with the category "foo" added, which is correct. If we leave out second paragraph:

First paragraph

[[category:foo]]

then the parser output is also correct:

<p>First paragraph </p>

But if we leave out the first paragraph:

[[category:foo]]

Second paragraph

then the parser output is wrong:

<p>
<br>
Second paragraph
</p>

Same happens also with language links. This is particularly troublesome when the "metadata" elements are stacked logically at the top of the page instead of the bottom, causing the infamous "gap" between the first heading (i.e. page name) and content.

Event Timeline

Lahwaacz raised the priority of this task from to Needs Triage.
Lahwaacz updated the task description. (Show Details)
Lahwaacz added a project: MediaWiki-Parser.
Lahwaacz subscribed.

There is no newline trim at the begin of wikitext, which can also results in this output for html comments, see T25698, which is talking about T6161

Looks related, but maybe not.

Related, but not the same. Category/language links are not handled equivalently to comments w.r.t. whitespace squashing. Consider the first snippet above,

First paragraph

[[category:foo]]

Second paragraph

is parsed as

<p>First paragraph </p>
<p>Second paragraph </p>

but

First paragraph

<!-- comment -->

Second paragraph

is parsed as

<p>First paragraph </p>
<p>
<br>
Second paragraph
</p>

Comments (1) and (2) suggest that this is how HTML comments work, but there is still some squashing e.g. around multiple comments on successive lines -- this snippet is parsed exactly the same way as above:

First paragraph

<!-- comment 1 -->
<!-- comment 2 -->
<!-- comment 3 -->

Second paragraph

The rules for whitespace squashing around category/language links are even more elaborate, except at the top of the page where all cause the same issue.

What's interesting is that if the first paragraph after the category/language links at the top starts with a wikilink, then it is parsed correctly:

[[category:foo]]

[[First]] paragraph

is rendered as

<p><a href="/index.php/First" title="First">First</a> paragraph</p>

i.e. without the <br> tag.

ssastry triaged this task as Medium priority.Jan 4 2017, 6:45 PM