plural-gettext.txt and plural.py have not been updated since August 2016
Open, LowestPublic
Actions

Assigned To

None

Authored By

	jayvdb
	Oct 8 2015, 1:59 AM

Description

plural.py in Pywikibot-compat was derived by @Xqt from plural-gettext.txt in commit a3078ff34a ; and was copied to Pywikibot in bfb857199 (minor fix minutes later 5664c76) , with no substantive changes since August 2011.

one source of this is MediaWiki-extensions-Translate https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/data/plural-gettext.txt

That was last updated in August 2011 from http://translate.sourceforge.net/wiki/l10n/pluralforms (rETRA56bf77b78d)
That project is now at http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html and is now hosted at https://github.com/translate/l10n-guide/blob/master/docs/l10n/pluralforms.rst and the raw data is in https://github.com/translate/translate/blob/master/translate/lang/data.py#L25 . The history shows many plural fixes since 2011.

The 'proper' source now is CLDR

http://cldr.unicode.org/index/cldr-spec/plural-rules
http://unicode.org/repos/cldr/trunk/common/supplemental/pluralRanges.xml
http://unicode.org/repos/cldr/trunk/common/supplemental/plurals.xml

With regards to MediaWiki-extensions-Translate , it is only GettextFFS which uses the plural-gettext.txt data file - MediaWiki itself now uses languages/data/plurals.xml and languages/data/plurals-mediawiki.xml which are CLDR based, and I guess the majority of the Translate extension uses this also. (scripts/plural-comparison.php also uses the plural-gettext.txt data file, but only to compare the various plural data )

However I expect that plural-gettext.txt distributed by MediaWiki-extensions-Translate should be updated.

For Pywikibot, a dev dependency on https://pypi.python.org/pypi/translate-toolkit would allow a maintenance script to load the plural data from the translate project, and generate plural.py.

>>> from translate.lang import data
>>> data.languages['gu']
('Gujarati', 2, '(n != 1)')

Details

	Subject	Repo	Branch	Lines +/-
	update plural rules	pywikibot/core	master	+23 -21

Customize query in gerrit

Related Objects

Mentioned In: T317527: plural-gettext.txt has not been updated since August 2016
T87135: Phabricator should only notify changes to the "Security" field if it is indeed changed
Mentioned Here: T115297: Pywikibot does not support explicit plural forms
rETRA56bf77b78d65: Update from http://translate.sourceforge.net/wiki/l10n/pluralforms.
rPWBCbfb857199bfd: update from trunk r9463
rPWBC5664c763978c: fix from trunk r9466
rPWBOa3078ff34a43: Plural rules for pywikipedia based on mw r95194 of plural-gettext.txt

Event Timeline

jayvdb created this task.Oct 8 2015, 1:59 AM

jayvdb raised the priority of this task from to High.

jayvdb updated the task description. (Show Details)

jayvdb added projects: Pywikibot, Pywikibot-i18n, MediaWiki-extensions-Translate.

jayvdb added subscribers: jayvdb, Xqt.

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptOct 8 2015, 1:59 AM

I think it should be possible to either have a script deriving the plural rules using Unicode's CLDR definition or to introduce/use that XML directly. For the second part the main problem is probably the copyright (although it seems you are able to publish and distribute the data files (which would include plurals.xml).

Anyway one different advantage of the plurals.xml file is that, with a proper parser, we could actually test their samples against our rules.

Of course if @Xqt has already something it would be better to evaluate if we can use that instead (before I waste time on a parser of Unicode's file).

I'd prefer that translate-toolkit is improved to use the Unicode's CLDR (if it isnt already?), with relevant tests added, and we simply consume that. Then all python projects benefit, and we benefit from having rules that are used and maintained by other projects.

Oh sorry it seemed I overlooked the last part of your opening post. Now for the moment I'm a bit worried that the package is a bit overkill when we need just one dict and then that it is a string and not a function. And that string is not even valid Python code:

'(n==1 || n==11) ? 0 : (n==2 || n==12) ? 1 : (n > 2 && n < 20) ? 2 : 3'

So this would either need a parser from us or another third party (unless I overlooked the parser in the project).

I also don't know what it is using as the source material. It is definitely not using the CLDR material directly as that is formatted differently.

Or for a completely different strategy, Pywikibot might be able to ask the server to expand the message, which would help with other issues such as T115297, gender and grammer rules.

In Translate plurals-gettext.txt is used when producing Gettext files. Ideally this would match Unicode CLDR data, only in different format. The compare-plurals.py is supposed to be able to find differences. I welcome any work towards bringing this file closer to CLDR data, making sure we take care of existing translations that might be affected. I do not have time in near future to work on this myself.

Liuxinyu970226 set Security to None.Nov 15 2015, 3:47 AM

jayvdb mentioned this in T87135: Phabricator should only notify changes to the "Security" field if it is indeed changed.Nov 15 2015, 4:09 AM

Do you want to own this bug in pywikibot? Otherwise I will re-prioritize and re-classify this on Translate board.

Change 307478 had a related patch set uploaded (by Xqt):
update plural rules

https://gerrit.wikimedia.org/r/307478

gerritbot added a project: Patch-For-Review.Aug 30 2016, 8:14 AM

Change 307478 merged by jenkins-bot:
update plural rules

https://gerrit.wikimedia.org/r/307478

I think it should be possible to either have a script deriving the plural rules using Unicode's CLDR definition

Yes, and if not it would be nice to ensure they match, so that translators aren't confused.

From a quick check I see http://userguide.icu-project.org/formatparse/messages and http://site.icu-project.org/design/formatting/select (plus https://ssl.icu-project.org/apiref/icu4j/com/ibm/icu/text/PluralFormat.html ); I'm not sure PyICU implements everything but https://github.com/ovalhub/pyicu/blob/9543d3856e5b6cb4986b107cdd0105400937f92f/format.cpp seems to be shipped in the package.

Dvorapa removed a project: Patch-For-Review.May 27 2018, 11:44 AM

Xqt renamed this task from plural-gettext.txt and plural.py have not been updated since August 2011 to plural-gettext.txt and plural.py have not been updated since August 2016.Sep 23 2018, 8:30 AM

Xqt lowered the priority of this task from High to Lowest.

I created a separate task for Translate.

plural-gettext.txt and plural.py have not been updated since August 2016Open, LowestPublicActions

Description

Details

Related Objects

Event Timeline

plural-gettext.txt and plural.py have not been updated since August 2016
Open, LowestPublic
Actions