Page MenuHomePhabricator

Missing attribution of messages in pywikibot/i18n JSON files.
Closed, ResolvedPublic

Description

JSON files have been added to pywikibot/i18n automatically (https://gerrit.wikimedia.org/r/#/c/164947/) after the translatewiki config change (https://gerrit.wikimedia.org/r/#/c/154796/4), which now has python and JSON files with the same messages. The JSON files are not used yet, as the code changes to enable JSON have exposed packaging problems that are the subject of RFC https://www.mediawiki.org/wiki/Requests_for_comment/pywikibot_2.0_packaging

The message files for the English message do not include the attribution; it is missing from the python message files and now also missing from the JSON messages files.

$ head -50 pywikibot.py 
# -*- coding: utf-8 -*-
msg = {
    'en': {
        'pywikibot-enter-category-name': u'Please enter the category name:',
        'pywikibot-enter-file-links-processing': u'Links to which file page should be processed?',
        'pywikibot-enter-finished-browser': u'Press Enter when finished in browser.',
        'pywikibot-enter-namespace-number': u'Please enter a namespace by its number:',
        'pywikibot-enter-new-text': u'Please enter the new text:',
        'pywikibot-enter-page-processing': u'Which page should be processed?',
        'pywikibot-enter-xml-filename': u'Please enter the XML dump\'s filename:',
    },
    # Author: Als-Holder
    # Author: TTMTT
    'qqq': {
        'pywikibot-enter-xml-filename': u'Message displayed to the bot owner to enter the XML dump\'s filename.',
        'pywikibot-enter-page-processing': u'Question displayed to the bot owner which page should be processed.',
        'pywikibot-enter-file-links-processing': u'Question displayed to the bot owner processing links to a given file page.',
        'pywikibot-enter-namespace-number': u'Message displayed to the bot owner to enter a namespace by its number.',
        'pywikibot-enter-new-text': u'Message displayed to the bot owner to enter the new text.',
        'pywikibot-enter-category-name': u'Message displayed to the bot owner to enter the category name.',
        'pywikibot-enter-finished-browser': u'Message displayed to the bot owner to press Enter button when browser edits are finished.',
    },
    # Author: Als-Holder
    'als': {
        'pywikibot-enter-xml-filename': u'Bitte gib dr Datename vum XML-Dump yy:',
        'pywikibot-enter-page-processing': u'Weli Syte soll bearbeitet wäre?',
        'pywikibot-enter-file-links-processing': u'Vu wellere Dateisyte solle d Link bearbeitet wäre?',
        'pywikibot-enter-namespace-number': u'Bitte gib d Nummere vum Namensruum yy:',
        'pywikibot-enter-new-text': u'Bitte gib dr nei Text yy:',
        'pywikibot-enter-category-name': u'Bitte gib dr Name vu dr Kategori yy:',
        'pywikibot-enter-finished-browser': u'Druck noch eme Zuemache vum Browsewr uf «Enter».',
    },
    ...

The metadata for als is correct

$ cat pywikibot/als.json 
{
	"@metadata": {
		"authors": [
			"Als-Holder"
		]
	},
	"pywikibot-enter-xml-filename": "Bitte gib dr Datename vum XML-Dump yy:",
	"pywikibot-enter-page-processing": "Weli Syte soll bearbeitet wäre?",
	"pywikibot-enter-file-links-processing": "Vu wellere Dateisyte solle d Link bearbeitet wäre?",
	"pywikibot-enter-namespace-number": "Bitte gib d Nummere vum Namensruum yy:",
	"pywikibot-enter-new-text": "Bitte gib dr nei Text yy:",
	"pywikibot-enter-category-name": "Bitte gib dr Name vu dr Kategori yy:",
	"pywikibot-enter-finished-browser": "Druck noch eme Zuemache vum Browsewr uf «Enter»."
}

However the metadata for 'qqq' is omitted

$ cat pywikibot/qqq.json 
{
	"@metadata": [],
	"pywikibot-enter-xml-filename": "Message displayed to the bot owner to enter the XML dump's filename.",
	"pywikibot-enter-page-processing": "Question displayed to the bot owner which page should be processed.",
	"pywikibot-enter-file-links-processing": "Question displayed to the bot owner processing links to a given file page.",
	"pywikibot-enter-namespace-number": "Message displayed to the bot owner to enter a namespace by its number.",
	"pywikibot-enter-new-text": "Message displayed to the bot owner to enter the new text.",
	"pywikibot-enter-category-name": "Message displayed to the bot owner to enter the category name.",
	"pywikibot-enter-finished-browser": "Message displayed to the bot owner to press Enter button when browser edits are finished."
}

And the 'en' files do not include a metadata block, which means they fail the i18n JSON syntax tests (T85335).

$ cat pywikibot/en.json 
{
	"pywikibot-enter-xml-filename": "Please enter the XML dump's filename:",
	"pywikibot-enter-page-processing": "Which page should be processed?",
	"pywikibot-enter-file-links-processing": "Links to which file page should be processed?",
	"pywikibot-enter-namespace-number": "Please enter a namespace by its number:",
	"pywikibot-enter-new-text": "Please enter the new text:",
	"pywikibot-enter-category-name": "Please enter the category name:",
	"pywikibot-enter-finished-browser": "Press Enter when finished in browser."
}

Event Timeline

jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added projects: I18n, Pywikibot-i18n.
jayvdb added subscribers: Unknown Object (MLST), valhallasw, Nikerabbit and 3 others.

Change 181370 had a related patch set uploaded (by John Vandenberg):
Add grunt test to validate i18n JSON

https://gerrit.wikimedia.org/r/181370

Patch-For-Review

jayvdb set Security to None.

We could create a Google-Code-in-2014 task to fix this, if the #translatewiki team dont have an easy/quick fix for this. I would need a translatewiki team member to help draft the task and ideally also co-mentor the task.

Change 181370 merged by jenkins-bot:
Add grunt test to validate i18n JSON

https://gerrit.wikimedia.org/r/181370

Xqt claimed this task.

And as we can see on https://gerrit.wikimedia.org/r/#/c/182691/ , there are many cases of translated message bundles where the JSON files do not have the same attribution as the Python.

And as we can see on https://gerrit.wikimedia.org/r/#/c/182691/ , there are many cases of translated message bundles where the JSON files do not have the same attribution as the Python.

I don't see what generated the JSON files or the Python excerpts in question, but one thing to consider is that maybe the older files are wrong: the "missing" attribution might also be attribution of translations which no longer exist, e.g. tranlsations of messages deleted in the meanwhile. Worth doing some sampled checking.

so just going through the first file

https://gerrit.wikimedia.org/r/#/c/182691/2/add_text.py,cm

  1. removes Csisc from 'qqq'

twn only has one revision, importing it

https://translatewiki.net/w/i.php?title=Pywikibot:Add_text-adding/qqq&action=history

it changed position by @siebrand in 1a89684f . http://git.wikimedia.org/blobdiff/pywikibot%2Fi18n.git/1a89684f/add_text.py

it was changed position by @siebrand in 448c4dee , from 'aeb' to 'qqq', in what looks like a case of mis-attribution.

http://git.wikimedia.org/blobdiff/pywikibot%2Fi18n.git/448c4dee/add_text.py

  1. removes 'Anskar' from 'ca'

'Anskar' did improve the translation, on twn.

https://translatewiki.net/w/i.php?title=Pywikibot:Add_text-adding/ca&action=history

Also, the original twn translation ("Robot afegeix %(adding)s") isnt attributed. @Xqt added it in 9af00aaa . http://git.wikimedia.org/commit/pywikibot%2Fi18n.git/9af00aaa
It wasnt in the list that was removed https://www.mediawiki.org/wiki/Special:Code/pywikipedia/9378 , so it had to come from somewhere else.

Ah ..., it looks like @Xqt copied the translations from the interwiki-adding message, which were added by @valhallasw , also without attribution.

http://git.wikimedia.org/blob/pywikibot%2Fi18n.git/71623c02d160849155a26ad922252ad11ff90cd7/interwiki.py

Looking through the 'compat' history of the interwiki.py script, the 'ca' translation was provided by "Arkaitz Zubiaga" in 2006.
https://www.mediawiki.org/wiki/Special:Code/pywikipedia/2864

  1. This is the 'ckb' entry for add_text and similar to (2) above. 'Asoxor' has been removed, but the definitely contributed to the translation over time, and part of their translation still exists in the current translation.

https://translatewiki.net/w/i.php?title=Pywikibot:Add_text-adding/ckb&action=history

The ckb original translation also has the same problem as above. The commit only mentions a SF #:

https://mediawiki.org/wiki/Special:Code/pywikipedia/8146

  1. 'da' - same problem, removing Christian List. There is little change since the original imported translation, so maybe neither Christian or Kaare should be attributed.

https://translatewiki.net/w/i.php?title=Pywikibot%3AAdd_text-adding%2Fda&diff=4281764&oldid=3179616

(da was also taken from interwiki.py; same problem)

  1. 'es', removes attribution of both TheBITLINK and Xqt. Xqt rightly removed attribution for a mechanical change, and TheBITLINK's change has been removed from the current translation, so I guess they dont need to be attributed any more???

https://translatewiki.net/w/i.php?title=Pywikibot%3AAdd_text-adding%2Fes&diff=5572307&oldid=4466954

  1. 'ja', removed attribution of Fryed-peach, despite them being a contributor the current translation text, and doesnt include attribution of the original translation, provided in https://mediawiki.org/wiki/Special:Code/pywikipedia/4868 with a SF #.

(there are another 7 cases of attribution removal in add_text.py alone, which I have only glanced at.)

From only a very quick look at the cases above, it appears that only the most recent twn editor is being credited.

Thanks for your effort investigating this. It does match the expected behavior: translatewiki.net only credits most recent translators of each translation. But usually this is offset by the fact that we retain existing credits. Due to way this is implement (trying to parse the source file) this does not work when exporting in non-native file format, which is what I think was done in this case to do the conversion.

Thanks for your effort investigating this. It does match the expected behavior: translatewiki.net only credits most recent translators of each translation. But usually this is offset by the fact that we retain existing credits. Due to way this is implement (trying to parse the source file) this does not work when exporting in non-native file format, which is what I think was done in this case to do the conversion.

TWN tools can't export all message contributors into a new JSON message file? (i.e. editors of the page on TWN)

Or, more bluntly, do we need to migrate the old translation metadata data from the python files to the new JSON files? It isnt a hard job, so no problems. I just want to be clear who is doing what to fix this situation so we can complete the migration to the JSON file format.

TWN tools can't export all message contributors into a new JSON message file? (i.e. editors of the page on TWN)

The full editors list is not a list of "all message contributors", but of all *potential* message contributors. Including, say, a vandal who got reverted. If T4994 was fixed, then Translate could be smarter (there was some research progress, so it might happen).

Or, more bluntly, do we need to migrate the old translation metadata data from the python files to the new JSON files?

Yes. However out of 6 cases, you found 1 (or perhaps 2) where the older editor contributed some *words* of the current sentence... it's not a concrete attribution issue except for the pre-TWN translators.

TWN tools can't export all message contributors into a new JSON message file? (i.e. editors of the page on TWN)

The full editors list is not a list of "all message contributors", but of all *potential* message contributors. Including, say, a vandal who got reverted. If T4994 was fixed, then Translate could be smarter (there was some research progress, so it might happen).

It is _always_ better to over-attribute than under-attribute :/

proper reverts are able to be eliminated with very minimal fuss; that isnt a research problem. And on a small wiki like TWN, and since it works on discrete units of information, that will eliminate 90+% of problematic attributions.

Add a default enabled block button checkbox "exclude user from translation attribution", and another 8% is solved. ;-) And nobody cares about the last 2% of false positives/

Or, more bluntly, do we need to migrate the old translation metadata data from the python files to the new JSON files?

Yes. However out of 6 cases, you found 1 (or perhaps 2) where the older editor contributed some *words* of the current sentence... it's not a concrete attribution issue except for the pre-TWN translators.

My count is 4 of 6 if I am being generous, or 5 of 6 if I am not. :/ I am only really excluding case (1), as that problem occurred before the history of the TWN page history. That TheBITLINK doesnt need to be attributed for the es translation doesnt feel right given their translation was used by pywikibot for a year an a quarter.

Note that the "current sentence" for this message is two words long in English and similar length in most other languages. If one word of a translators work remains in the current message, that is 50% of the translation.

Anyway, it sounds like the pywikibot team needs to manually add the list of contributors from TWN to the JSON in order for the attribution to not regress further. i.e. the attribution removals in Xqt's patch to the python need to be reversed into additions to the JSON. Or, it might be more efficient to obtain the TWN contributor list from the TWN wiki for each message page. Luckily we have a tool which can do that .. ;-) I can't help but think we'd be re-inventing the TWN tools.

In absence of other replies, I would appreciate if you can copy over the missing attributions to JSON files. That also takes care if there are any non-twn authors in the list.

Change 212885 had a related patch set uploaded (by Ladsgroup):
Add proper attribution in json files based on .py files

https://gerrit.wikimedia.org/r/212885

Change 212885 merged by jenkins-bot:
Add proper attribution in json files based on .py files

https://gerrit.wikimedia.org/r/212885

Ladsgroup claimed this task.
Ladsgroup removed a project: Patch-For-Review.