Page MenuHomePhabricator

WikitextPFragment concatenation code is too aggressive with adding `<nowiki/>`
Closed, ResolvedPublic

Description

In particular, we are adding <nowiki/> tags between adjacent extension tags, due to some somewhat broken code in DataAccess::preprocessWikitext() which was intended to prevent this.

Event Timeline

Change #1119178 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] Temporarily disable <nowiki/> insertion between WikitextPFragments

https://gerrit.wikimedia.org/r/1119178

To reproduce on parsoidtest1001:

cscott@parsoidtest1001:~$ echo "{{wikiquote}}" | sudo -u www-data php /srv/mediawiki/multiversion/MWScript.php /srv/parsoid-testing/bin/parse.php --wiki=enwiki --integrated  --pageName "Arthur_Seyss-Inquart" --logFile /tmp/csa-fragment.logs --dump tplsrc > /tmp/csa-test.fragment.html

which yielded:

<templatestyles src="Module:Side box/styles.css"></templatestyles><nowiki/><temp
latestyles src="Sister project/styles.css"></templatestyles><nowiki/><div class=
"side-box side-box-right plainlinks sistersitebox"><templatestyles src="Plainlis
t/styles.css"></templatestyles><nowiki/>
<div class="side-box-flex">
<div class="side-box-image">[[File:Wikiquote-logo.svg|40x40px|class=noviewer|alt
=
    ]]</div>
<div class="side-box-text plainlist">Wikiquote has quotations related to '''''[[
q:Special:Search/Arthur Seyss-Inquart|Arthur Seyss-Inquart]]'''''.</div></div>
</div>

Note the <nowiki/> between <templatestyle> tags.

This is because the output of $parser->getStripState()->split($wikitext) in DataAccess::preprocessWikitext is something like:

[ '', <exttag>, '', <exttag>, '', <exttag>, '' ]

and the code in preprocessWikitext labelled "// Concatenate this extension tag to the previous wikitext, so that PFragment doesn't try to add <nowiki>s between the pieces to prevent token-gluing" doesn't actually work, instead creating:

[ '<templatestyles>', '', '<templatestyles>', '', '' ]

and that still triggers <nowiki/> insertion after stripping '' entries.

Two possible solutions: one is improving DataAccess::preprocessWikitext so that the "concatenate this extension tag to the previous wikitext" code actually works properly.

Alternatively/orthogonally, we could be more careful with <nowiki/> insertion and look at the text on either side of the join in the same way that ConstrainedText and escapeWikitext does, so that (importantly) ...>, <... doesn't trigger nowiki insertion.

Change #1119183 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] DataAccess::preprocessWikitext(): fix logic around WikitextPFragment merging

https://gerrit.wikimedia.org/r/1119183

Change #1119178 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Temporarily disable <nowiki/> insertion between WikitextPFragments

https://gerrit.wikimedia.org/r/1119178

Change #1119183 merged by jenkins-bot:

[mediawiki/core@master] DataAccess::preprocessWikitext(): fix logic around WikitextPFragment merging

https://gerrit.wikimedia.org/r/1119183

Change #1093399 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[operations/mediawiki-config@master] Turn on Parsoid fragment support everywhere

https://gerrit.wikimedia.org/r/1093399

Change #1120147 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a16

https://gerrit.wikimedia.org/r/1120147

Change #1120147 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.21.0-a16

https://gerrit.wikimedia.org/r/1120147

Change #1093399 merged by jenkins-bot:

[operations/mediawiki-config@master] Turn on Parsoid fragment support everywhere

https://gerrit.wikimedia.org/r/1093399

Mentioned in SAL (#wikimedia-operations) [2025-02-27T21:26:59Z] <ladsgroup@deploy2002> Started scap sync-world: Backport for [[gerrit:1093399|Turn on Parsoid fragment support everywhere (T374661 T386233)]]

Mentioned in SAL (#wikimedia-operations) [2025-02-27T21:34:37Z] <ladsgroup@deploy2002> cscott, ladsgroup: Backport for [[gerrit:1093399|Turn on Parsoid fragment support everywhere (T374661 T386233)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-02-27T21:50:25Z] <ladsgroup@deploy2002> Finished scap sync-world: Backport for [[gerrit:1093399|Turn on Parsoid fragment support everywhere (T374661 T386233)]] (duration: 23m 25s)