Page MenuHomePhabricator

Section parsing bug on :en:Wikimedia Foundation
Closed, ResolvedPublic

Description

Sectioning (both in MCS and Parsoid) appears to be broken by a usage of the selfref template in 'Wikimedia Foundation' on enwiki. It appears Parsoid isn't closing the span it creates for some reason, which is resulting in the span unexpectedly wrapping the entire body content.

Of course, this breaks all of the functionality that depends on knowing about sections (including lead intro creation, which affects both the page endpoints and the new summary endpoint).

Event Timeline

Well this is interesting:

Revision 803552163 was vandalized (rev. 803788378) and then quickly reverted (rev. 803788478) on October 4. The Wikitext is identical before and after, and parses identically in the MediaWiki PHP parser (diff), but the before and after results are quite different in Parsoid HTML (perhaps because of a Parsoid deployment that day).

To make a long story short, the new version wraps all of the <body> children in a span (class="selfreference") which breaks our section parsing. It looks like it breaks Parsoid's own section parsing as well (all sections are given data-mw-section-id -1).

@ssastry is this a Parsoid bug?

(Related: https://en.wikipedia.org/wiki/Template:Selfref)

Mholloway renamed this task from MCS summary endpoint unexpectedly returns 204 for :en:Wikimedia Foundation to Section parsing bug on :en:Wikimedia Foundation.Dec 8 2017, 1:53 PM
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)

The problem seems to predate that patch. A new paragraph begins after the <div> closes but before a </span>gets a chance to closes as well.

0-[HTML]       | {"type":"EndTagTk","name":"div","attribs":[],"dataAttribs":{"stx":"html","tmp":{"inTransclusion":true}}}
0-[HTML]       | {"type":"TagTk","name":"p","attribs":[],"dataAttribs":{"tmp":{"inTransclusion":true,"tagId":5}}}
0-[HTML]       | {"type":"EndTagTk","name":"span","attribs":[],"dataAttribs":{"stx":"html","tmp":{"inTransclusion":true}}}

Previously, paragraph wrapping induced the </span> closing early,

<p><span style="font-style: italic;"></span></p>
<div role="note">For the project page on the foundation itself, see <a href="Wikipedia:Wikimedia_Foundation" title="Wikipedia:Wikimedia Foundation">Wikipedia:Wikimedia Foundation</a>.</div>
<p></p>

But now that useless p-wrapping is removed, the <span> doesn't get closed and wraps the whole page (the nested closing tag doesn't match up).

Mholloway lowered the priority of this task from High to Medium.EditedDec 9 2017, 9:05 PM

I updated a selfref template argument to reflect that it was being used as a hatnote rather than in an inline reference. I'd suspected this would resolve the parsing/sectioning problem, and it did.

This seems kind of edge case-y and I'm not sure what the Parsoid philosophy is on what level of defense is warranted against this kind of template usage error, so I'll leave this bug open to give the Parsing team a chance to follow up if needed.

Change 399471 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Don't insert new paragraph start before end tags

https://gerrit.wikimedia.org/r/399471

Change 399766 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Account for SOL transparent templates in p-wrapping

https://gerrit.wikimedia.org/r/399766

Change 399471 abandoned by Arlolra:
Don't insert new paragraph start before end tags

https://gerrit.wikimedia.org/r/399471

Change 399766 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Account for SOL transparent templates in p-wrapping

https://gerrit.wikimedia.org/r/399766

Closing since the patch merged. I will confirm the fix after the next deploy.