Page MenuHomePhabricator

Fix attribute order round-tripping for sub-references (dirty diff)
Closed, ResolvedPublicBUG REPORT

Description

When running round-trip parser tests on a ref with a "name" attribute appearing before the "details", we find that the attribute order will be switched to alphabetical during Parsoid html2wt:

<ref name="n" details="abc">def</ref>

is round-tripped to:

<ref details="abc" name="n">def</ref>

This causes dirty diffs and needs to be fixed. The issue is probably caused by the wt2html transformation removing the "name" attribute from the subref data-mw attrs when splitting into two refs, and when the data-mw is reconstructed during html2wt the attributes end up with the default order.

To test, remove the following lines from subReferencing.txt case "Subreferencing round trips in reverse alphabetical attribute order"

!! options
parsoid=wt2html

and then run the html2wt test for this case (example uses docker-dev):

./modules/mediawiki/bin/mwscript tests/parser/parserTests.php --wiki=dev --file=/srv/docker-dev/mediawiki/extensions/Cite/tests/parser/subReferencing.txt  --html2wt

Expected failure will look like this:

@@ -1 +1 @@
-<ref name="a" details="0">def</ref>
+<ref details="0" name="a">def</ref>

There are four possible approaches so far:

  • T404966: Fix selective serialization for subreferences
  • Ignore the issue and hope that selective serialization prevents dirty diffs unless a ref is edited.
  • Preserve the name attribute order of appearance, eg. with a new attribute data-parsoid.nameIndex, and use this to manually reinsert the attribute into the right place during html2wt.
  • Allow the name attribute to come through and always interpret it in a special way as the main ref name, if and only if details are present.

The first option fixes the biggest issue of dirty diffs, but leaves a small inelegant behavior when the editor actually changes subrefs. The second option of doing nothing is the simplest, the last two are brittle.

Event Timeline

awight changed the subtype of this task from "Spike" to "Task".

Change #1129311 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/extensions/Cite@master] Alphabetize attributes for easier round-trip testing

https://gerrit.wikimedia.org/r/1129311

Deprioritizing as a minor detail. Selective serialization prevents dirty diffs unless the ref was edited.

Change #1129311 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Alphabetize attributes for easier round-trip testing

https://gerrit.wikimedia.org/r/1129311

WMDE-Fisch changed the subtype of this task from "Task" to "Bug Report".
Lina_Farid_WMDE renamed this task from Fix attribute order round-tripping for main+details references to Fix attribute order round-tripping for main+details references (unintended diff).Sep 3 2025, 1:06 PM

Triage:

  • no content loss
  • Less problematic than other unintended diffs, as the order is not fixed. We assume personal preference might cause inconvenience for experienced editors.
awight renamed this task from Fix attribute order round-tripping for main+details references (unintended diff) to Fix attribute order round-tripping for sub-references (dirty diff).Sep 18 2025, 12:29 PM

Change #1189481 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/extensions/Cite@master] Fix subref attribute order

https://gerrit.wikimedia.org/r/1189481

Change #1189481 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Fix subref attribute order

https://gerrit.wikimedia.org/r/1189481

Change #1191341 had a related patch set uploaded (by WMDE-Fisch; author: Awight):

[mediawiki/extensions/Cite@wmf/1.45.0-wmf.20] Fix subref attribute order

https://gerrit.wikimedia.org/r/1191341

Change #1191341 merged by jenkins-bot:

[mediawiki/extensions/Cite@wmf/1.45.0-wmf.20] Fix subref attribute order

https://gerrit.wikimedia.org/r/1191341

Heads up, this is *not* fixed in production and we're not sure why not.

It seems that the deployment was incomplete, we expect the feature fix to be deployed correctly with next week's train...

Okay, now it seems that Parsoid is doing some surprisingly long-term caching and our issue is indeed resolved for updated or new pages.

To check this, open a page such as https://de.wikipedia.org/w/index.php?title=Achat&veaction=edit which has sub-references and will be damaged by the attribute order bug. To see the issue, just make a small change to the page and review changes as wikitext. To see the issue fixed, open the page in a wikitext editor, make a small change, then open the page again in VE and make another change. Review changes and you should see no attribute order movement.

Noting here that a simple action=purge successfully purges Parsoid cache and makes the bug go away forever. We might want to purge all pages using sub-referencing.

Generating a list of pages with details:

curl 'https://de.wikipedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Wikipedia:Seite_mit_ref-Parameter_details&cmlimit=500&format=json' -o - | \
  jq -r ".query.categorymembers.[].title" > T389363-pages.txt

Mentioned in SAL (#wikimedia-operations) [2025-09-29T13:58:08Z] <awight@deploy2002> mwscript-k8s job started: purgePage.php --wiki=dewiki # T389363