Page MenuHomePhabricator

Parsoid/legacy parser {{Pre}} template rendering difference
Open, MediumPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Check
https://en.wikipedia.org/w/index.php?title=Uniform_Resource_Identifier&useparsoid=1#Example_URIs
vs
https://en.wikipedia.org/w/index.php?title=Uniform_Resource_Identifier&useparsoid=0#Example_URIs

What happens?:

The Parsoid version shows all the <span> markup instead of colored text like in the legacy parser version.

What should have happened instead?:

It's not clear to me that this markup is actually valid, but since it does work with the legacy parser, there's probably something to be done there.

Note
Smaller reproducer:

$ echo "{{Pre|{{color|rgb(0,76,178)|userinfo}}}}" | php ./bin/parse.php
<style data-mw-deduplicate="TemplateStyles:r1057110237" typeof="mw:Extension/templatestyles mw:Transclusion" about="#mwt1" data-parsoid='{"pi":[[{"k":"1"}]],"dsr":[0,40,null,null]}' data-mw='{"parts":[{"template":{"target":{"wt":"Pre","href":"./Template:Pre"},"params":{"1":{"wt":"{{color|rgb(0,76,178)|userinfo}}"}},"i":0}}]}'>.mw-parser-output .pre-borderless{border:none}</style><pre class="pre" typeof="mw:Extension/pre" about="#mwt1" data-parsoid='{"stx":"html","src":"&lt;pre class=\"pre \" >&lt;span style=\"color:rgb(0,76,178)\">userinfo&lt;/span>&lt;/pre>"}' data-mw='{"name":"pre","attrs":{"class":"pre"},"body":{"extsrc":"&lt;span style=\"color:rgb(0,76,178)\">userinfo&lt;/span>"}}'>&lt;span style="color:rgb(0,76,178)">userinfo&lt;/span></pre>

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

That is because template expansion returns this.

$ echo "{{Pre|{{color|rgb(0,76,178)|userinfo}}}}" | php ./bin/parse.php --dump tplsrc
[dump] ============================ template source ============================
TEMPLATE:Template:PreTRANSCLUSION:"{{Pre|{{color|rgb(0,76,178)|userinfo}}}}"
--------------------------------------------------------------------------------
<templatestyles src="Pre/styles.css"/><pre class="pre " ><span style="color:rgb(0,76,178)">userinfo</span></pre>

I think this is somehow relying on different behavior in Parser.php when HTML is being generated (vs. templates are being expanded). It might rely on arguments being processed before the template being expanded in this case? To be investigated (by whoever takes this up).

I think it's actually the following bit in Template:Pre:

<pre<includeonly></includeonly> class="pre ...

I bet the <includeonly> there (included to deliberately workaround /something or other/ in the legacy parser, I'm sure) is tripping Parsoid up, just because <includeonly> is handled slightly differently from other transclusions. If this were <pre{{1x}} class= I bet it would work fine in Parsoid. (Might break legacy in that case, for whatever reason the <includeonly> was originally added.)

I bet the <includeonly> there (included to deliberately workaround /something or other/ in the legacy parser, I'm sure)

The <includeonly> placed directly after the <pre prevents the regexp $elementsRegex in buildDomTreeArrayFromText of the legacy preprocessor from matching the element as an extension tag, since the tag name needs to be followed by a space or closing bracket. When the includeonly gets dropped however, the <pre is left to recombine with the rest of the string class="pre ...> to form a valid html5 pre tag. So, as @cscott suspects, it is a workaround to avoid the semantics of the pre extension tag and allow wikitext syntax will be parsed in it, for example, the heading in,

{{Pre|<span>hello</span>

== hi ==
}}

as

<pre><span>hello</span>

<h2><span class="mw-headline" id="hi">hi</span></h2>
</pre>

That is because template expansion returns this.

As @ssastry points out, Parsoid gets the post-template expansion text (after the includeonly is stripped) and interprets the pre as an extensions tag.

All this is pretty well understood and by design in the documentation of the of the template,
https://en.wikipedia.org/wiki/Template:Pre

HTML and wikimarkup aren't disabled as in <pre>...</pre> and are rendered as usual (thus if a parameter contains any wikimarkup, enclose it in <nowiki>...</nowiki>); however, multiple spaces are preserved.

In order to get this to work in Parsoid, we could maybe introduce an attribute on the pre extension <pre parsewikitext="1"> (choose a better name) that gives the same semantics and then update the template to remove the hack.

Doing an insource:/"<pre<"/ search on enwiki shows a few other uses of the pattern.

In order to get this to work in Parsoid, we could maybe introduce an attribute on the pre extension <pre parsewikitext="1"> (choose a better name) that gives the same semantics and then update the template to remove the hack.

I like this proposal.

Change 992274 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Add attribute to pre extension to parse wikitext

https://gerrit.wikimedia.org/r/992274

Change 993051 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [WIP] Add attribute to pre extension to parse wikitext

https://gerrit.wikimedia.org/r/993051

Discussed this during Tech forum today. tl;dr is I'm fine with an attribute for the <pre> extension tag to control this behavior, although I'd prefer that it's not called html because I don't want to encourage the "wikitext is a superset of HTML" confusion. The <pre> extension tag, even with this functionality enabled, still has quite a number of differences with its HTML counterpart. I'm fine with parsewikitext or just parsed or just wikitext or whatever, just leave "html" out of it. :)

A related issue is that as used on-wiki, the <pre> extension ostensibly requires access to the parent frame:

[[Template:Demonstration]]
<pre parsed=true>
This is my argument: {{{1}}}
</pre>

As presently implemented, the contents of the <pre> extension (in Parsoid land at least) are "raw text", and if we try to parse this to wikitext we can't properly expand the {{{1}}} because we don't have access to the parent frame.

*However* this distinction between "expanded wikitext" and "raw text" arguments is deeper than this, and we /already have/ a mechanism to pass the body contents of the extension tag "as expanded wiki text" to wit:

[[Template:Demonstration]]
{{#tag:pre|This is my argument: {{{1}}}}|parsed=true}}

This works as intended: the {{{1}}} is expanded in the frame of [[Template:Demonstration]] before the argument is passed to the implementation of the <pre> extension tag.

Veering a little bit off track, I'll point to T268144#7704327 and the general extension tag/parser function uniformity issues (T204370 will stand-in for that discussion). Part of the idea is that *any* argument ought to be able to be passed/fetched either "as raw text" or "as expanded wikitext", which roughly corresponds to "lazy" or "eager" evaluation of the arguments in the traditional programming languages sense. We showed above that the "body" argument for an extension tag can be passed in either form, depending on whether the html-ish <tag> or template-ish {{#tag:...}} syntax is used. It would be desirable to be able to do the same for *any* argument to a transclusion, and perhaps this can be part of the semantics of {T114432: [RFC] Heredoc arguments for templates (aka "hygienic" or "long" arguments)}. That is, we already can pass an "expanded wikitext" argument like:

{{Foo|arg={{{1}}}}|bar}}

but if we wanted to pass the argument instead "as raw text" you might write it as

{{Foo|arg=<<<
some | raw | text | ignore | markup
>>>|bar}}

This is a little bit at odds with one of the motivating examples for heredocs (from T114432):

{{cite|id=“32412”|<<<
First person plural pronouns in Isthmus-Mecayapan Nahuat:

:''nejamēn'' ({{IPA|[nehameːn]}}) "We, but not you" (= me & them)
:''tejamēn'' ({{IPA|[tehameːn]}}) "We along with you" (= me & you & them)
>>>}}

In this example we very much wanted the "raw text" interpretation of the = characters, but we *did* want to eventually evaluate the wikitext and expand it.

The basic idea for a solution is that the parameters are passed in as a variant type which the implementation can "demand" as appropriate type from using an asFoo() method. In this case, the Cite extension would take the body argument and call body.asParsedWikitext() on it. When provided with a variant type containing raw text, the asParsedWikitext() method would parse it. When provided with a variant type containing "expanded wikitext" it would also skip the initial preprocessor/expand-templates state (to avoid double expansion) and parse it from there.

Similarly, .asExpandedWikitext() on the argument provides the usual value for compatibility with existing parser function etc implementations, regardless of whether it was passed as raw text or already as expanded wikitext. .asHtml is appropriate if the output is going to be spliced into an HTML output, and works regardless of whether the argument was provided as wikitext, as raw text, or as HTML (from the strip state; see T257606#9216471).

ABreault-WMF lowered the priority of this task from High to Medium.Jun 3 2025, 6:55 PM

@tstarling has proposed elsewhere that <ref> should not be parsed as an extension tag, but rather in the same way we parse <span>, <div> and other "HTMLish" tags; that is, the contents are parsed in the same document context as the rest of the wikitext, so that <ref>{{{1}}}</ref> works. This could be expanded to a number of other tag types, like <indicator>, and here with <pre> it seems folks are using <pre<!-- --> as a hacky way to get exactly this behavior from the <pre> "extension" as well.

I wonder if there's a principled way to support these semantics.

I have two concerns:

  1. How does an editor know which interpretation of <...> syntax to expect? Currently the answer is "if <...> names a valid HTML5 tag, it is parsed 'transparently', otherwise it is parsed as 'an extension tag'". But that simple model isn't 100% accurate: <pre> being one exception (HTML5 tag parsed "as extension tag"), and <ref>/<includeonly>/<noinclude> being exceptions in the other direction (non-HTML5 tag which is parsed "transparently"). Further, this simple model doesn't actually match with the logic currenly implemented by the preprocessor, which is "if there is whitespace or /> immediately after the tag name and the tag name is registered as an extension tag, then parse it 'as an extension tag', otherwise parse 'transparently'". Is it possible to come up with a naming or syntax convention so that editors can form a simple mental model which is actually /correct/?
  2. Is it possible to "opt-in" to the *other* syntax form? We have {{#tag:....}} which (to some degree) lets you specify any tag (extension or otherwise) with "transparent" contents. Perhaps we need <tag:....> (bikeshed the name) to do the opposite: guaranteed to be processed as an extension tag? That way even if <pre>/<ref>/etc are unpredictable to some degree, if you *want* a "transparent" <pre> you can use {{#tag:pre|<<<...>>>}} and if you want a "as an extension tag" <div> you could use <tag:div>...</tag:div>, etc? (Note the heredoc to {{#tag}} to change how the argument is delimited; there's an interaction here I'm not 100% confident in stating.)

(Another way of stating this: if you can't remember whether <pre> is "extension tag" or "transparent", can you use <tag:pre>...</tag:pre> to guarantee the "extension tag" semantics, and {{#tag:pre}} to guarantee the "transparent" semantics? Note that outside of the syntactic forms, there are different expectations in eager/lazy expansion of the wikitext provided as an argument here.)

Change #1193271 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] WIP: use PFragmentHandler for <pre>

https://gerrit.wikimedia.org/r/1193271

cscott removed cscott as the assignee of this task.Nov 3 2025, 4:47 PM

Change #1202885 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Add tests for template pre hack

https://gerrit.wikimedia.org/r/1202885

Change #1202885 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add tests for template pre hack

https://gerrit.wikimedia.org/r/1202885

Change #1203504 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a3

https://gerrit.wikimedia.org/r/1203504

Change #1203504 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a3

https://gerrit.wikimedia.org/r/1203504

Change #1204685 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Add more tests for template pre hack

https://gerrit.wikimedia.org/r/1204685

Change #1205247 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] [WIP] Attr to disable extension tag

https://gerrit.wikimedia.org/r/1205247

Change #1204685 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add more tests for template pre hack

https://gerrit.wikimedia.org/r/1204685

Change #1210725 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a6

https://gerrit.wikimedia.org/r/1210725

Change #1210725 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a6

https://gerrit.wikimedia.org/r/1210725

Change #992274 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Add attr to pre ext to parse content as wikitext

https://gerrit.wikimedia.org/r/992274

Change #1205247 abandoned by Arlolra:

[mediawiki/services/parsoid@master] [WIP] Attr to disable extension tag

https://gerrit.wikimedia.org/r/1205247

Change #1213536 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a8

https://gerrit.wikimedia.org/r/1213536

Change #1213536 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.23.0-a8

https://gerrit.wikimedia.org/r/1213536

Change #993051 merged by jenkins-bot:

[mediawiki/core@master] Add attribute to pre extension to parse wikitext

https://gerrit.wikimedia.org/r/993051

fatwiki has two entries

fat.wikipedia	Nhwɛdo:Pre	<templatestyles src="Pre/styles.css"/><pre<includeonly></includeonly> class="pre {{#ifeq:{{{border|}}}|no|pre-borderless}}" {{#if:{{{space|}}}{{{widt
fat.wikipedia	Nhwɛdo:Testcase	| {{{title|{{{id|{{{1|}}}}}}}}} |- | colspan="2" | <pre<includeonly></includeonly> style="background:none; border:none; padding:0; margin:0;

I made the following edits,
https://fat.wikipedia.org/w/index.php?title=Nhw%C9%9Bdo%3APre&diff=51418&oldid=15035
https://fat.wikipedia.org/w/index.php?title=Nhw%C9%9Bdo%3ATestcase&diff=51417&oldid=15205

and the rendering diff was resolved on,
https://fat.wikipedia.org/wiki/Template%3AHidden_begin%2Ftestcases?useparsoid=0
https://fat.wikipedia.org/wiki/Template%3AHidden_begin%2Ftestcases?useparsoid=1

I left the hack in place so that the legacy parser continues to render as it previously had, despite https://gerrit.wikimedia.org/r/c/mediawiki/core/+/993051 having been merged and we can rerun visualdiff. I'll make a similar edit on a few more blocked wikis.

Note that tumwiki is classified as being blocked on T398969 and it's possible that all those wikis will be accounted for by this task

At least one wasn't but it should be fixed in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1217714

we can rerun visualdiff. I'll make a similar edit on a few more blocked wikis.

Visual diff runs were promising. The results are in https://docs.google.com/spreadsheets/d/1NsqZizThK3DqOBqA9qFwXbqgzpqi9gAY51Kb2Mxxhzk/edit?gid=108135004#gid=108135004

On that basis, I started updating all the templates in T353697#11450970

In the process, I'm noticing a few things. One, there is a hitch with template parameters. For example,

<pre<includeonly></includeonly> style="{{#ifeq:{{{1}}}|scroll|

If I were to remove the include hack, we'd have the same problem as in T348722. We'd want to rewrite this as,

{{#tag:pre|...|format="wikitext"}}

as in the test case in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/993051/5/tests/parser/preTags.txt

A second note is that there's a bunch of documentation that describes the include hack that should be updated to suggest using the new format="wikitext" attribute.

As <pre> is a parser tag, it escapes wikitext and HTML tags. This can be prevented with the use of <includeonly> ...

Update documentation on
https://diq.wikipedia.org/wiki/Pe%C5%9Fti:Wikitext
https://azb.wikipedia.org/wiki/%DA%A9%D8%A4%D9%85%DA%A9:%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%84%D8%B4%D8%AF%DB%8C%D8%B1%D9%85%D9%87
https://dz.wikipedia.org/wiki/Wikipedia:Line-break_handling

Please see T412577, which may have been caused by changes related to this bug report at en.WP.