Page MenuHomePhabricator

plural-gettext.txt and plural.py have not been updated since August 2016
Open, LowestPublic

Description

plural.py in Pywikibot-compat was derived by @Xqt from plural-gettext.txt in commit a3078ff34a ; and was copied to Pywikibot in bfb857199 (minor fix minutes later 5664c76) , with no substantive changes since August 2011.

one source of this is MediaWiki-extensions-Translate https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/data/plural-gettext.txt

That was last updated in August 2011 from http://translate.sourceforge.net/wiki/l10n/pluralforms (rETRA56bf77b78d)
That project is now at http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html and is now hosted at https://github.com/translate/l10n-guide/blob/master/docs/l10n/pluralforms.rst and the raw data is in https://github.com/translate/translate/blob/master/translate/lang/data.py#L25 . The history shows many plural fixes since 2011.

The 'proper' source now is CLDR

http://cldr.unicode.org/index/cldr-spec/plural-rules
http://unicode.org/repos/cldr/trunk/common/supplemental/pluralRanges.xml
http://unicode.org/repos/cldr/trunk/common/supplemental/plurals.xml

With regards to MediaWiki-extensions-Translate , it is only GettextFFS which uses the plural-gettext.txt data file - MediaWiki itself now uses languages/data/plurals.xml and languages/data/plurals-mediawiki.xml which are CLDR based, and I guess the majority of the Translate extension uses this also. (scripts/plural-comparison.php also uses the plural-gettext.txt data file, but only to compare the various plural data )

However I expect that plural-gettext.txt distributed by MediaWiki-extensions-Translate should be updated.

For Pywikibot, a dev dependency on https://pypi.python.org/pypi/translate-toolkit would allow a maintenance script to load the plural data from the translate project, and generate plural.py.

>>> from translate.lang import data
>>> data.languages['gu']
('Gujarati', 2, '(n != 1)')

Event Timeline

jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added subscribers: jayvdb, Xqt.
XZise subscribed.

I think it should be possible to either have a script deriving the plural rules using Unicode's CLDR definition or to introduce/use that XML directly. For the second part the main problem is probably the copyright (although it seems you are able to publish and distribute the data files (which would include plurals.xml).

Anyway one different advantage of the plurals.xml file is that, with a proper parser, we could actually test their samples against our rules.

Of course if @Xqt has already something it would be better to evaluate if we can use that instead (before I waste time on a parser of Unicode's file).

I'd prefer that translate-toolkit is improved to use the Unicode's CLDR (if it isnt already?), with relevant tests added, and we simply consume that. Then all python projects benefit, and we benefit from having rules that are used and maintained by other projects.

XZise removed XZise as the assignee of this task.EditedOct 11 2015, 11:17 PM

Oh sorry it seemed I overlooked the last part of your opening post. Now for the moment I'm a bit worried that the package is a bit overkill when we need just one dict and then that it is a string and not a function. And that string is not even valid Python code:

'(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3'

So this would either need a parser from us or another third party (unless I overlooked the parser in the project).

I also don't know what it is using as the source material. It is definitely not using the CLDR material directly as that is formatted differently.

Or for a completely different strategy, Pywikibot might be able to ask the server to expand the message, which would help with other issues such as T115297, gender and grammer rules.

In Translate plurals-gettext.txt is used when producing Gettext files. Ideally this would match Unicode CLDR data, only in different format. The compare-plurals.py is supposed to be able to find differences. I welcome any work towards bringing this file closer to CLDR data, making sure we take care of existing translations that might be affected. I do not have time in near future to work on this myself.

Do you want to own this bug in pywikibot? Otherwise I will re-prioritize and re-classify this on Translate board.

Change 307478 had a related patch set uploaded (by Xqt):
update plural rules

https://gerrit.wikimedia.org/r/307478

Change 307478 merged by jenkins-bot:
update plural rules

https://gerrit.wikimedia.org/r/307478

I think it should be possible to either have a script deriving the plural rules using Unicode's CLDR definition

Yes, and if not it would be nice to ensure they match, so that translators aren't confused.

From a quick check I see http://userguide.icu-project.org/formatparse/messages and http://site.icu-project.org/design/formatting/select (plus https://ssl.icu-project.org/apiref/icu4j/com/ibm/icu/text/PluralFormat.html ); I'm not sure PyICU implements everything but https://github.com/ovalhub/pyicu/blob/9543d3856e5b6cb4986b107cdd0105400937f92f/format.cpp seems to be shipped in the package.

Xqt renamed this task from plural-gettext.txt and plural.py have not been updated since August 2011 to plural-gettext.txt and plural.py have not been updated since August 2016.Sep 23 2018, 8:30 AM
Xqt lowered the priority of this task from High to Lowest.