Page MenuHomePhabricator

<meta typeof="mw:Includes/..."> nodes are always serialized to wikitext with newlines around them
Closed, ResolvedPublic

Description

When testing T250937, I noticed that <meta typeof="mw:Includes/..."> nodes seem to be always serialized to wikitext with newlines around them, even if there weren't any in the original wikitext.

Event Timeline

Minimal test case:

Input HTML:

<p>a<meta typeof="mw:Includes/NoInclude">b<meta typeof="mw:Includes/NoInclude/End">c</p>

Expected output wikitext:

a<noinclude>b</noinclude>c

Actual output wikitext:

a
<noinclude>
b
</noinclude>
c

Change 598888 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: <*include*> tags don't need newlines before/after

https://gerrit.wikimedia.org/r/598888

Change 598888 merged by jenkins-bot:
[mediawiki/services/parsoid@master] <*include*> tags don't need newlines before/after

https://gerrit.wikimedia.org/r/598888

Even with this patch, I still see the issue locally when editing pages with VisualEditor, am I misunderstanding something?

Change 601428 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a15

https://gerrit.wikimedia.org/r/601428

Change 601428 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a15

https://gerrit.wikimedia.org/r/601428

Even with this patch, I still see the issue locally when editing pages with VisualEditor, am I misunderstanding something?

I imagine VE is sending along newlines and Parsoid uses those. In html->wt code path, Parsoid's constraints sets up min/max newlines ... or none => the newlines in HTML are transferred over.See line 123 in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/598888/4/src/Html2Wt/DOMHandlers/MetaHandler.php

[subbu@earth:~/work/wmf/parsoid] echo '<p>a<meta typeof="mw:Includes/NoInclude">b<meta typeof="mw:Includes/NoInclude/End">c</p>' | php bin/parse.php --html2wt
a<noinclude>b</noinclude>c

[subbu@earth:~/work/wmf/parsoid] echo -e '<p>a\n<meta typeof="mw:Includes/NoInclude">\nb\n<meta typeof="mw:Includes/NoInclude/End">\nc</p>' | php bin/parse.php --html2wt
a
<noinclude>
b
</noinclude>
c

But, if we just want to drop all newlines from the HTML around noincludes, I could set constraints to force newlines to zero always if that is what is required.

I think I must have been testing with the wrong version of Parsoid, as I can't reproduce the problem I was seeing any more.

I think I must have been testing with the wrong version of Parsoid, as I can't reproduce the problem I was seeing any more.

It is possible ... But, note that as my commandline reproduction showed, this issue exists in Parsoid and the output depends on what HTML editing clients like VE send along. If we need to forcibly suppress those newlines always, we can. But, I am not certain if that is necessary.

Yeah, we don't send those newlines there in VisualEditor.