Page MenuHomePhabricator

Cite error discrepancy between legacy parser and parsoid on ukwikipedia (wikidata-related?)
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

Compare
https://uk.wikipedia.org/wiki/%D0%9A%D0%B0%D1%80%D0%BB_%D0%91%D1%8E%D1%85%D0%B5%D1%80?useparsoid=1
and
https://uk.wikipedia.org/w/index.php?title=%D0%9A%D0%B0%D1%80%D0%BB_%D0%91%D1%8E%D1%85%D0%B5%D1%80&useparsoid=0

What happens?:

The Parsoid version has errors in the References block; the Legacy parser does not.

What should have happened instead?:

Both parsers should agree on the existence of errors.

Other information (browser name/version, screenshots, etc.):

The error is particularly egregious because of https://phabricator.wikimedia.org/T380045 that makes it show even worse, but that should be fixed with this week (Nov 19th) train.

This issue has been reported on https://www.mediawiki.org/wiki/Talk:Parsoid/Parser_Unification/Known_Issues#c-%D0%90%D1%82%D0%B0-20241117172800-References_from_Wikidata_not_shown_correctly, which points at an issue with Wikidata handling as a possible root cause.

Event Timeline

Reproducible with the following wikitext:

<ref name="plop"><span class="wikidata_cite citetype_Q36524 citetype_Q17152639 citetype_Q1172284" data-entity-id="Q36578"><i class="wef_low_priority_links">[[:Німецька національна бібліотека|Deutsche Nationalbibliothek]]</i> [http://d-nb.info/gnd/118516884/ Record #118516884] // Gemeinsame Normdatei<span class="wef_low_priority_links"> — 2012—2016.</span></span><div style="display:none">[[d:Track:Q27302]][[d:Track:Q36578]]</div></ref>
<ref name="plop"><span class="wikidata_cite citetype_Q36524 citetype_Q17152639 citetype_Q1172284" data-entity-id="Q36578"><i class="wef_low_priority_links">[[:Німецька національна бібліотека|Deutsche Nationalbibliothek]]</i> [http://d-nb.info/gnd/118516884/ Record #118516884] // Gemeinsame Normdatei<span class="wef_low_priority_links"> — 2012—2016.</span></span><div style="display:none">[[d:Track:Q27302]][[d:Track:Q36578]]</div></ref>

which triggers an error on Parsoid but not on legacy.

Change #1092272 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/extensions/Cite@master] Normalize ref html before comparison

https://gerrit.wikimedia.org/r/1092272

Change #1092272 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Normalize ref html before comparison

https://gerrit.wikimedia.org/r/1092272

Change #1097464 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Cite@master] Add a test for multiple definition with the same complex content

https://gerrit.wikimedia.org/r/1097464

Change #1109710 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/extensions/Cite@master] Normalize ref html before comparison, take 2

https://gerrit.wikimedia.org/r/1109710

Change #1097464 abandoned by Arlolra:

[mediawiki/extensions/Cite@master] Add a test for multiple definition with the same complex content

Reason:

Squashed in Iae68a8eab46d1e033c9575c1467eff8e24422d9e

https://gerrit.wikimedia.org/r/1097464

Change #1109710 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Normalize ref html before comparison, take 2

https://gerrit.wikimedia.org/r/1109710

A couple of other cases of this on https://de.wiktionary.org/wiki/Katze and https://nl.wiktionary.org/wiki/gaan - the template information seems to induce some issues there. Some more tweaking required.

Change #1114395 had a related patch set uploaded (by Isabelle Hurbain-Palatin; author: Isabelle Hurbain-Palatin):

[mediawiki/extensions/Cite@master] Do not display errors on named references differing only by data-mw

https://gerrit.wikimedia.org/r/1114395

Change #1114395 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Do not display errors on named references differing only by data-mw

https://gerrit.wikimedia.org/r/1114395

Need to double-check what happens on https://nl.wiktionary.org/wiki/gaan (because it should be fixed and it's not), moving back to the appropriate column.

What happens on https://nl.wiktionary.org/wiki/gaan is that we have a template that generates citations to Typisch Vlaams. That templates names the references with page and column numbers, which has apparently been considered enough of a distinguisher for two different citations... except on that page, it is not (for two citations at least). So the editors add spaces in the page number to have two different ref names, which works on legacy, but not on parsoid where we normalize things more.

There's probably a way to tweak that by fetching the name from the tplsrc and not from the arguments. That's a different issue than this one, though, technically.

T386713 is open to handle the ref name space normalization, so let's close this one.