Page MenuHomePhabricator

Arguments with <nowiki> lose <span typeof="mw:Nowiki"> in parsoid output
Closed, ResolvedPublic

Description

Because Parsoid "fragment v2" support moves <nowiki> processing in parser function/template expansion from Parsoid to core's preprocessor, the <span typeof="mw:Nowiki"> wrappers previously used are no longer present.

These were usually stripped in transcluded content, but not always. For example:

echo '{{1x|<nowiki>foo</nowiki>}}' | php maintenance/parse.php --parsoid
echo '{{1x|<span><nowiki>foo</nowiki></span>}}' | php maintenance/parse.php --parsoid

In the first command, we would preserve the typeof="mw:Nowiki" in the output:

<section data-mw-section-id="0" id="mwAQ"><p id="mwAg"><span typeof="mw:Nowiki mw:Transclusion" about="#mwt2" id="mwAw" data-mw="{&quot;parts&quot;:[{&quot;template&quot;:{&quot;target&quot;:{&quot;wt&quot;:&quot;1x&quot;,&quot;href&quot;:&quot;./Template:1x&quot;},&quot;params&quot;:{&quot;1&quot;:{&quot;wt&quot;:&quot;<nowiki>foo</nowiki>&quot;}},&quot;i&quot;:0}}]}">foo</span><span class="edited4" about="#mwt2" id="mwBA"></span></p>

But in the second command, the <nowiki> was not at top level, so it would be stripped:

<section data-mw-section-id="0" id="mwAQ"><p id="mwAg"><span about="#mwt2" typeof="mw:Transclusion" id="mwAw" data-mw="{&quot;parts&quot;:[{&quot;template&quot;:{&quot;target&quot;:{&quot;wt&quot;:&quot;1x&quot;,&quot;href&quot;:&quot;./Template:1x&quot;},&quot;params&quot;:{&quot;1&quot;:{&quot;wt&quot;:&quot;<span><nowiki>foo</nowiki></span>&quot;}},&quot;i&quot;:0}}]}">foo</span><span class="edited4" about="#mwt2" id="mwBA"></span></p>

With fragment v2 support enabled, the typeof="mw:Nowiki" would be absent in both cases, because the nowiki is being put into the strip state and reconstituted as a DOMFragment token.

See also https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1117613 for a slightly-related issue, where I argue that we shouldn't necessarily strip nowiki in "MediaWiki DOM Spec" content, in order to allow html2wt. But that's not relevant here because the nowikis are coming from an *argument*, so even if we were to html2wt the result the relevant HTML would be the typeof="mw:Param" that corresponds to the {{{1}}} or whatever, not the actual argument HTML.

If we needed to fix this, to preserve nowiki wrappers in argument values, we'd probably need to introduce a new strip state type or new metadata for the strip state so that in DataAccess:;preprocess we could introduce a <span typeof="mw:Nowiki"> wrapper at the correct place:

			// Where the result has strip state markers, tunnel this content
			// through Parsoid as a PFragment type.
			$pieces = $parser->getStripState()->split( $wikitext );
			if ( count( $pieces ) > 1 ) {
				for ( $i = 0; $i < count( $pieces ); $i++ ) {
					[ 'type' => $type, 'content' => $content ] = $pieces[$i];
					if ( !$content ) {
						$pieces[$i] = '';
					} elseif ( $type === 'string' ) {
						// wikitext (could include extension tag snippets like <tag..>...</tag>)
						$pieces[$i] = $content;
					} elseif ( $type === 'nowiki' ) {
                                                if (this fragment comes from a <nowiki>) {
                                                  $content="<span....>$content</span>";
                                                }
						$pieces[$i] = HtmlPFragment::newFromHtmlString( $content, null );
					} else { // @phan-suppress-current-line PhanPluginDuplicateIfStatements

Note that the 'nowiki' strip state type is not exclusive to <nowiki>, it is used for rawHTML results from parser functions in general (including for Chart, etc) and so we need some other indicator to be saved in the strip state to indicate the source.

Event Timeline

Change #1127630 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Preserve <span typeof="mw:Nowiki"> with Parsoid Fragment mode v2

https://gerrit.wikimedia.org/r/1127630

Change #1127671 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Preserve <nowiki/> with Parsoid Fragment mode v2

https://gerrit.wikimedia.org/r/1127671

Change #1127671 abandoned by C. Scott Ananian:

[mediawiki/core@master] Preserve <nowiki/> with Parsoid Fragment mode v2

Reason:

Squashed with Iab8e861d4b7dd86680f29602856e6710f0140e1c

https://gerrit.wikimedia.org/r/1127671

Change #1127630 merged by jenkins-bot:

[mediawiki/core@master] Preserve <span typeof="mw:Nowiki"> with Parsoid Fragment mode v2

https://gerrit.wikimedia.org/r/1127630

cscott claimed this task.