Page MenuHomePhabricator

Parsoid tests in REL1_XX branches need updating for T407131
Closed, ResolvedPublic

Description

T407131: CVE-2025-67479: Magic word replacement in legacy parser allows using reserved data attributes through wikitext seems to have caused failures in REL1_XX branches of parsoid

Seen doing backports for T412194: Deprecated: The predefined locally scoped $http_response_header variable is deprecated, call http_get_last_response_headers() instead in …/vendor/justinrainbow/json-schema/src/JsonSchema/Uri/Retrievers/FileGetContents.php on line 55 on https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1219224 https://integration.wikimedia.org/ci/job/quibble-composer-mysql-php81/20780/console

21:59:17 1) ParserIntegrationTest::testParse with data set "headings.txt: Fuzz testing: Parser14-table" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17  '<div class="mw-heading mw-heading2"><h2 id="a">a</h2><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=1" title="Edit section: a">edit</a><span class="mw-editsection-bracket">]</span></span></div>\n
21:59:17 -<table style="&#95;_TOC&#95;_">\n
21:59:17 +<table style="&#95;&#95;TOC&#95;&#95;">\n
21:59:17  <tbody><tr><td></td></tr>\n
21:59:17  </tbody></table>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 2) ParserIntegrationTest::testParse with data set "legacyHeadings.txt: Fuzz testing: Parser14-table" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17  '<h2><span class="mw-headline" id="a">a</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=1" title="Edit section: a">edit</a><span class="mw-editsection-bracket">]</span></span></h2>\n
21:59:17 -<table style="&#95;_TOC&#95;_">\n
21:59:17 +<table style="&#95;&#95;TOC&#95;&#95;">\n
21:59:17  <tbody><tr><td></td></tr>\n
21:59:17  </tbody></table>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 3) ParserIntegrationTest::testParse with data set "parserTests.txt: Sanitizer: Escaping of spaces, multibyte characters, colons & other stuff in id=""" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17 -'<p><span id="æ:_v">byte</span><a href="#æ:_v">backlink</a>\n
21:59:17 +'<p><span id="æ:&#95;v">byte</span><a href="#æ:_v">backlink</a>\n
21:59:17  </p>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 4) ParserIntegrationTest::testParse with data set "parserTests.txt: Sanitizer: Escaping of spaces, multibyte characters, colons & other stuff in id="" (legacy)" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17 -'<p><span id=".C3.A6:_v">byte</span><a href="#.C3.A6:_v">backlink</a>\n
21:59:17 +'<p><span id=".C3.A6:&#95;v">byte</span><a href="#.C3.A6:_v">backlink</a>\n
21:59:17  </p>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 5) ParserIntegrationTest::testParse with data set "parserTests.txt: Sanitizer: Validating the contents of the id attribute (T6515)" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17 -'<p><br /><br id="a_space" />\n
21:59:17 +'<p><br /><br id="a&#95;space" />\n
21:59:17  </p>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 6) ParserIntegrationTest::testParse with data set "parserTests.txt: Edit comment with section link that has a link in it" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17 -'<span class="autocomment"><a href="#A_link">→<bdi dir="ltr">&#91;[A link]]</bdi></a></span>'
21:59:17 +'<span class="autocomment"><a href="#A_link">→<bdi dir="ltr">&#91;&#91;A link&#93;&#93;</bdi></a></span>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 7) ParserIntegrationTest::testParse with data set "parserTests.txt: Id starting with underscore" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17 -'<div id="_ref"></div>'
21:59:17 +'<div id="&#95;ref"></div>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 
21:59:17 8) ParserIntegrationTest::testParse with data set "parserTests.txt: HTML5 data attributes" ('legacy')
21:59:17 Failed asserting that two strings are equal.
21:59:17 --- Expected
21:59:17 +++ Actual
21:59:17 @@ @@
21:59:17  '<p><span data-foo="bar">Baz</span>\n
21:59:17  </p>\n
21:59:17 -<p data-abc-def_hij="">Quuz</p>'
21:59:17 +<p>Quuz</p>'
21:59:17 
21:59:17 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
21:59:17 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67

TLDR may be that rGPARae6b65472b12: Parser test sync just needs backporting

Event Timeline

Reedy triaged this task as High priority.Dec 17 2025, 10:05 PM
Reedy updated the task description. (Show Details)

Can tools/sync-parserTests.js just be used for this?

Change #1221114 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/services/parsoid@REL1_44] Parser test sync

https://gerrit.wikimedia.org/r/1221114

Change #1221114 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_44] Parser test sync

https://gerrit.wikimedia.org/r/1221114

Change #1221205 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/services/parsoid@REL1_45] Parser test sync

https://gerrit.wikimedia.org/r/1221205

Change #1221206 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/services/parsoid@REL1_43] Parser test sync

https://gerrit.wikimedia.org/r/1221206

Change #1221207 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/services/parsoid@REL1_39] Parser test sync

https://gerrit.wikimedia.org/r/1221207

Change #1221205 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_45] Parser test sync

https://gerrit.wikimedia.org/r/1221205

In REL1_39 and REL1_43 parsoid is not happy with some changes to the parser tests which happened in core. I am not fully sure why. Maybe someone with more expertise could take a look?

The REL1_43 one looked to be 1 character difference in a test you didn't touch....

https://integration.wikimedia.org/ci/job/quibble-composer-mysql-php81/20999/console

-<figure class="mw-default-size" typeof="mw:File/Thumb"><a href="/wiki/File:Foobar.jpg" class="mw-file-description"><img src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" width="180" height="20" class="mw-file-element" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x" /></a><figcaption> <h2><span class="mw-headline" id="This_is_section_9.2C_even_though_it.27s_in_a_caption">This is section 9, even though it's in a caption</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=9" title="Edit section: This is section 9, even though it&#039;s in a caption">edit</a><span class="mw-editsection-bracket">]</span></span></h2> </figcaption></figure>\n
+<figure class="mw-default-size" typeof="mw:File/Thumb"><a href="/wiki/File:Foobar.jpg" class="mw-file-description"><img src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" width="180" height="20" class="mw-file-element" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x" /></a><figcaption> <h2><span class="mw-headline" id="This_is_section_9.2C_even_though_it.27s_in_a_caption">This is section 9, even though it's in a caption</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=9" title="Edit section: This is section 9, even though it&#39;s in a caption">edit</a><span class="mw-editsection-bracket">]</span></span></h2> </figcaption></figure>\n

#039; -> #39; in a test relating to T213468: Parsoid section IDs don't correspond to PHP section IDs when headings are transcluded. I suspect they are functionally equivalent, as to why it's suddenly apparently broken...

23:41:47 1) ParserIntegrationTest::testParse with data set "sectionWrappingParserTests.txt: T213468: Corner cases in edit section ID assignment in tokenizer" ('legacy')
23:41:47 Failed asserting that two strings are equal.
23:41:47 --- Expected
23:41:47 +++ Actual
23:41:47 @@ @@
23:41:47  <p>Not a ==heading==\n
23:41:47  </p>\n
23:41:47  <h2><span class="mw-headline" id="PHP_section.3D8">PHP section=8</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=8" title="Edit section: PHP section=8">edit</a><span class="mw-editsection-bracket">]</span></span></h2>\n
23:41:47 -<figure class="mw-default-size" typeof="mw:File/Thumb"><a href="/wiki/File:Foobar.jpg" class="mw-file-description"><img src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" width="180" height="20" class="mw-file-element" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x" /></a><figcaption> <h2><span class="mw-headline" id="This_is_section_9.2C_even_though_it.27s_in_a_caption">This is section 9, even though it's in a caption</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=9" title="Edit section: This is section 9, even though it&#039;s in a caption">edit</a><span class="mw-editsection-bracket">]</span></span></h2> </figcaption></figure>\n
23:41:47 +<figure class="mw-default-size" typeof="mw:File/Thumb"><a href="/wiki/File:Foobar.jpg" class="mw-file-description"><img src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" width="180" height="20" class="mw-file-element" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x" /></a><figcaption> <h2><span class="mw-headline" id="This_is_section_9.2C_even_though_it.27s_in_a_caption">This is section 9, even though it's in a caption</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=9" title="Edit section: This is section 9, even though it&#39;s in a caption">edit</a><span class="mw-editsection-bracket">]</span></span></h2> </figcaption></figure>\n
23:41:47  <h2><span class="mw-headline" id="PHP_section.3D10">PHP section=10</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=10" title="Edit section: PHP section=10">edit</a><span class="mw-editsection-bracket">]</span></span></h2>'
23:41:47 
23:41:47 /workspace/src/tests/phpunit/suites/ParserIntegrationTest.php:72
23:41:47 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67
23:41:47 /workspace/src/tests/phpunit/suites/SuiteEventsTrait.php:67

REL1_39...

https://integration.wikimedia.org/ci/job/quibble-composer-mysql-php81/20998/console

T2041: Template parameters shown as broken links

23:34:27 Loaded known failures from /workspace/src/services/parsoid/tests/parser/parserTests-standalone-knownFailures.json
23:35:04 =====================================================
23:35:04 UNEXPECTED FAIL: T2041: Template parameters shown as broken links (html2wt)
23:35:04 OPTIONS:
23:35:04 
23:35:04 INPUT:
23:35:04 <p>{{{parameter}}}
23:35:04 </p>
23:35:04 RAW EXPECTED:
23:35:04 {{{parameter}}}
23:35:04 RAW RENDERED:
23:35:04 <nowiki>{{{parameter}}}</nowiki>
23:35:04 
23:35:04 NORMALIZED EXPECTED:
23:35:04 {{{parameter}}}
23:35:04 NORMALIZED RENDERED:
23:35:04 <nowiki>{{{parameter}}}</nowiki>
23:35:04 DIFF:
23:35:04 --- Original
23:35:04 +++ New
23:35:04 @@ @@
23:35:04 -{{{parameter}}}
23:35:04 +<nowiki>{{{parameter}}}</nowiki>

^ I'm less worried about REL1_39, because T403199: Formally EOL MW 1.39 is imminent.

The REL1_43 one looked to be 1 character difference in a test you didn't touch....

https://integration.wikimedia.org/ci/job/quibble-composer-mysql-php81/20999/console

-<figure class="mw-default-size" typeof="mw:File/Thumb"><a href="/wiki/File:Foobar.jpg" class="mw-file-description"><img src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" width="180" height="20" class="mw-file-element" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x" /></a><figcaption> <h2><span class="mw-headline" id="This_is_section_9.2C_even_though_it.27s_in_a_caption">This is section 9, even though it's in a caption</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=9" title="Edit section: This is section 9, even though it&#039;s in a caption">edit</a><span class="mw-editsection-bracket">]</span></span></h2> </figcaption></figure>\n
+<figure class="mw-default-size" typeof="mw:File/Thumb"><a href="/wiki/File:Foobar.jpg" class="mw-file-description"><img src="http://example.com/images/thumb/3/3a/Foobar.jpg/180px-Foobar.jpg" decoding="async" width="180" height="20" class="mw-file-element" srcset="http://example.com/images/thumb/3/3a/Foobar.jpg/270px-Foobar.jpg 1.5x, http://example.com/images/thumb/3/3a/Foobar.jpg/360px-Foobar.jpg 2x" /></a><figcaption> <h2><span class="mw-headline" id="This_is_section_9.2C_even_though_it.27s_in_a_caption">This is section 9, even though it's in a caption</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/index.php?title=Parser_test&amp;action=edit&amp;section=9" title="Edit section: This is section 9, even though it&#39;s in a caption">edit</a><span class="mw-editsection-bracket">]</span></span></h2> </figcaption></figure>\n

#039; -> #39; in a test relating to T213468: Parsoid section IDs don't correspond to PHP section IDs when headings are transcluded. I suspect they are functionally equivalent, as to why it's suddenly apparently broken...

Ok I actually missed that this is a separate file since it looked like the test case I touched in headings.txt. Since sectionWrappingParserTests.txt is actually not synced with core, we need to manually modify it to make it pass. The corresponding change in master was done in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1115137.

Change #1221206 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_43] Parser test sync

https://gerrit.wikimedia.org/r/1221206

Reedy changed the task status from Open to In Progress.Dec 28 2025, 2:33 AM
Reedy assigned this task to Zabe.

For REL1_39 it appears that the core parsers tests weren't updated to reflect https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/1081290.

Change #1221301 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/core@REL1_39] Update T2041 parser test

https://gerrit.wikimedia.org/r/1221301

Change #1221301 merged by jenkins-bot:

[mediawiki/core@REL1_39] Update T2041 parser test

https://gerrit.wikimedia.org/r/1221301

Change #1221207 merged by jenkins-bot:

[mediawiki/services/parsoid@REL1_39] Parser test sync

https://gerrit.wikimedia.org/r/1221207