Page MenuHomePhabricator

CX2: References by name disappear and produce missing reference errors when published
Closed, ResolvedPublic

Description

When translating Hugo Kołłątaj from English to Japanese, many of the references don't get transferred when a paragraph is added to the translation. These are using the reuse reference style, where <ref name="name" /> is used in the text and the information for such reference is provided at the reference list element using the same name.

For example, the first paragraph contains two references in the original:

'''Hugo Stumberg Kołłątaj''', alt. ''Kołłątay'', (1 April 1750 – 28 February 1812) was a prominent Polish constitutional reformer and educationalist, and one of the most prominent figures of the [[Enlightenment in Poland|Polish Enlightenment]].<ref name=WIEM/> <ref>{{Cite web | last = | first = | title = The Year of Hugo Kołłątaj | url = http://www.uj.edu.pl/documents/10172/30ba9fc6-3eca-49ff-8ae0-cc084c2ab226 | publisher = [[Jagiellonian University]] |pages= 12–14 | date = | accessdate = 14 May 2014 }}</ref>

However, when it is added to the translation only the second one (the one with the info in-place) is added to the translation while the one with the "WIEM" name is not added:

In the original article the details for the "WIEM" reference (as with several many other references in the article) is provided inside the "reflist" element:

==References==
{{reflist|30em|refs=
...
<ref name="WIEM">{{pl icon}} [http://portalwiedzy.onet.pl/37510,,,,kollataj_hugo,haslo.html Kołłątaj Hugo], [[WIEM Encyklopedia]]</ref>
}}

As reported and exemplified by a user, this results in a published article missing references and full of errors in their list of references:

Details

Related Gerrit Patches:
mediawiki/extensions/ContentTranslation : masterGive higher rank for ve.dm.MWReferencesListNode than cxTransclusion

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 11 2018, 11:38 AM
Arrbee moved this task from Needs Triage to CX2 on the ContentTranslation board.Oct 15 2018, 1:04 PM

This seems to be the cause of the issue this user had when translating Steve Krug from English to French.

Another example that we may want to test is "Acute bronchitis" from English to Japanese.
This was reported as T207265, which should be reopened if we found out the current ticket was not the real cause of the issue.

Another case we may want to check is when the reference is fist defined inside a template such as an infobox and later used by name. This was reported for the translation of Gastroschisis from English to Japanese

In the Gastroschisis case the edit disappear in the tool itself.

Pginer-WMF raised the priority of this task from Medium to High.Oct 26 2018, 7:06 AM
santhosh claimed this task.Oct 31 2018, 7:24 AM

There are multiple issues here. Let us anlyse each one.

  1. The en:Hugo Kołłątaj article has reference content inside the reflist at the end of the article. In every section references are referring that actual reference content using unique name.
===Early life===
Hugo Kołłątaj was born 1 April 1750 in Dederkały Wielkie (now in Western Ukraine) in [[Volhynian Voivodeship (1569–1795)|Volhynia]] into a family of minor [[Polish nobility]]. Soon after, his family moved to [[Nieciesławice]], near [[Sandomierz]], where he spent his childhood.<ref name="min"/><ref name=Bauer40/><ref name="tucz"/><ref name="LERSKIEducation1996"/> He attended school in [[Pińczów]].<ref name="nd"/> He began his studies at the [[Jagiellonian University|Kraków Academy]], subsequently, [[Jagiellonian University]], where he studied law and gained a doctorate.<ref name=WIEM/><ref name="Bauer40"/> Afterwards, around 1775 he took [[holy orders]].<ref name="kai"/> He studied in [[Vienna]] and Italy ([[Naples]] and [[Rome]]), where he would have encountered [[Age of Enlightenment|Enlightenment]] [[philosophy]].<ref name=WIEM/><ref name=Bauer40/><ref name="LERSKIEducation1996"/><ref name="aj"/> He is thought to have gained two further doctorates abroad in philosophy and [[theology]].<ref name=kai/>

==References==
{{reflist|30em|refs=

<ref name="aj">{{pl icon}}Halina Zwolska, [http://www3.uj.edu.pl/alma/04/64.html TOWARZYSZE SZKOŁY GŁÓWNEJ KORONNEJ] {{webarchive|url=https://web.archive.org/web/20120415032311/http://www3.uj.edu.pl/alma/04/64.html |date=2012-04-15 }}, Alma Mater, wiosna 1997, nr 4</ref>

<ref name="Bauer40">{{cite book|author=Krzysztof Bauer|title=Uchwalenie i obrona Konstytucji 3 Maja|url=https://books.google.com/books?id=WLNGAAAAIAAJ|accessdate=2 January 2012|year=1991|publisher=Wydawnictwa Szkolne i Pedagogiczne|isbn=978-83-02-04615-5|page=40}}</ref>

<ref name="Bauer41">{{cite book|author=Krzysztof Bauer|title=Uchwalenie i obrona Konstytucji 3 Maja|url=https://books.google.com/books?id=WLNGAAAAIAAJ|accessdate=2 January 2012|year=1991|publisher=Wydawnictwa Szkolne i Pedagogiczne|isbn=978-83-02-04615-5|page=41}}</ref>

... and so on
}}

VE does not support editing this references in visual mode. You need to use source mode. It will tell "This reference is defined in a template or other generated block, and for now can only be edited in source mode."

CX inherits this limitation, but in addition, when a section with such a reference is translated and added to target document, VE will remove such reference silently. This is because, the new content from HTML fragment is cleaned up and considered as content from clipboard. The reference element with no meaningful data(or not able to find the data hidden in reflist) , is skipped(See T110479).

So this is the root cause behind the issue:

However, when it is added to the translation only the second one (the one with the info in-place) is added to the translation while the one with the "WIEM" name is not added:

  1. About the many errors in reflist after publishing to ja.wikipedia.org, It is a different problem. A content like:
'''Hugo StumbergKołłątaj''' 、alt。  ''Kołłątay'' (1750年4月1日 -  1812年2月28日)は、著名なポーランドの憲法5改革者で教育者であり、 ポーランド啓蒙主義の最も顕著な人物の1人であった。 
<ref>{{Cite web|url=http://www.uj.edu.pl/documents/10172/30ba9fc6-3eca-49ff-8ae0-cc084c2ab226|title=The Year of Hugo Kołłątaj|author=|first=|date=|publisher=[[Jagiellonian University]]|pages=12–14|accessdate=14 May 2014}}</ref>

{{Reflist|30em|refs=<ref name="aj">{{pl icon}}Halina Zwolska, [http://www3.uj.edu.pl/alma/04/64.html TOWARZYSZE SZKOŁY GŁÓWNEJ KORONNEJ] {{webarchive|url=https://web.archive.org/web/20120415032311/http://www3.uj.edu.pl/alma/04/64.html |date=2012-04-15 }}, Alma Mater, wiosna 1997, nr 4</ref>

<ref name="Bauer40">{{cite book|author=Krzysztof Bauer|title=Uchwalenie i obrona Konstytucji 3 Maja|url=https://books.google.com/books?id=WLNGAAAAIAAJ|accessdate=2 January 2012|year=1991|publisher=Wydawnictwa Szkolne i Pedagogiczne|isbn=978-83-02-04615-5|page=40}}</ref>
}}

Will render as following in ja.wikipedia.org

While the same willl render as the following in en.wikipedia.org, ml.wikipedia.org and many other wikis

Note the references listed in reflist, eventhough they are not referred by any content in the article is listed without errors.

So this is basically a difference in Template:Reflist behavior in different wikis and I don't think it is an issue caused or created in CX.

Since CX follows translations by adding a section at a time, at any order, there is a challenge to support named referneces. The content can be in

  1. Another section which already added to translation
  2. Same section, in a different sentence for example.
  3. in another section, but that is not added to translation.
  4. In reflist like the one explained above in this ticket.

To resolve the names, a translator should know where the reference is actualy defined and should add that to translation. That is non-obvious and impossible with current UI. Chasing the content definition should not be left to translators.

An idea I have, but not tried out, is as follows:

In source content, before it is added to translation, resolve all references in it using the source document. So, if the source section is

'''Hugo Stumberg Kołłątaj''', alt. ''Kołłątay'', (1 April 1750 – 28 February 1812) was a prominent Polish constitutional reformer and educationalist, and one of the most prominent figures of the [[Enlightenment in Poland|Polish Enlightenment]].<ref name=WIEM/>

Lookup the source document for the definition of WIEM, and resolve it in the content to get

'''Hugo Stumberg Kołłątaj''', alt. ''Kołłątay'', (1 April 1750 – 28 February 1812) was a prominent Polish constitutional reformer and educationalist, and one of the most prominent figures of the [[Enlightenment in Poland|Polish Enlightenment]].<ref name="WIEM">{{pl icon}} [http://portalwiedzy.onet.pl/37510,,,,kollataj_hugo,haslo.html Kołłątaj Hugo], [[WIEM Encyklopedia]] </ref>

This will resolve cases 1, 2, 3 listed above. I am not sure if we can resolve the content for case 4.

Resolving the references like this will cause duplication issue in translation. But thankfully @Catrope had written a deduplicate code to have the reference definition only once. But again, this wont work for case 4.

@Catrope Do you think this is a sensible approach? Any idea, how to handle case 4?

// Convert DOM to node, preserving full internal list
// Use clipboard mode to ensure reference body is outputted
sourceNode = ve.dm.converter.getDomFromNode( sourceNodeModel, true ).body.children[ 0 ];

Well, we are doing that reference resolution already. What is missing is case 4, where reference body is inside reflist.

santhosh added a comment.EditedNov 5 2018, 10:55 AM

It seems there is a problem with the registration of ve.dm.CXTransclusionNode. It inherits ve.dm.MWTransclusionNode and prevents ve.dm.MWReferencesListNode being used for the reflist.

<div about="#mwt221" data-mw={} typeof="mw:Transclusion"> 
<ol about="#mwt326 class="mw-references references" data-mw="{}" typeof="mw:Extension/references">

For this reflist, in CX, cxTransclusion data model is assigned, while expected result is mwReferencesListNode. The node passes ve.dm.MWReferencesListNode.static.matchFunction but

// HACK: This prevents any rules with higher specificity from matching,
// e.g. LanguageAnnotation which uses a match function
ve.dm.MWTransclusionNode.static.matchFunction = function ( ) {
	return true;
};

prevents any dm classes being chosen. I just tried to edit this as

ve.dm.MWTransclusionNode.static.matchFunction = function (el) {
	return !ve.dm.MWReferencesListNode.static.matchFunction(el) ;
};

This makes the reflist identified as mwReferencesListNode and even solves the case 4.

So, the question now is to find out the correct fix for mwReferencesListNode vs cxTransclusion for reflist nodes

Change 471699 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Give higher rank for ve.dm.MWReferencesListNode than cxTransclusion

https://gerrit.wikimedia.org/r/471699

Change 471699 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Give higher rank for ve.dm.MWReferencesListNode than cxTransclusion

https://gerrit.wikimedia.org/r/471699

According to this comment, the original case reported seems to be solved.
However, the case described in T206756#4694153 where the references are defined in an infobox seems to still be failing. Depending on the complexity of this case we may need to create a specific separate ticket.

However, the case described in T206756#4694153 where the references are defined in an infobox seems to still be failing. Depending on the complexity of this case we may need to create a specific separate ticket.

I created a separate ticket for this case: T209266: CX2: Support for references added by name when the details are inside a template

Etonkovidova closed this task as Resolved.Nov 17 2018, 2:19 AM
Etonkovidova added a subscriber: Etonkovidova.

The fix seems to be working - both in the context of ContentTranslation and after the publishing:
(1) When translating paragrpahs -all references are present, e.g. using the example from this task:

(2) Reference list seems to have only errors that refer to "Cite error: <ref> tag with name "Bauer41" defined in <references> is not used in prior text."
During the testing the fix, I've translated text that has references from [1] to [9] and they are correctly displayed after publishing and other references ([10] through [14] ) are published with an error which is a reasonable behavior.