Page MenuHomePhabricator

Extract Python library for Wikimedia tool i18n from Wikidata Lexeme Forms tool
Open, Needs TriagePublic

Description

The Wikidata Lexeme Forms tool has, over the past few years, grown a quite useful set of i18n functionality, recently adding translatewiki.net integration as well (T272243). I think we’re starting to reach the point where it would be useful to extract it into a separate library so that other Python tools can use it as well.

Benefits over the Jinja i18n extension (see also template designer documentation), which I assume is the default i18n solution for Flask projects:

  • Integration with translatewiki.net, including conversion between MediaWiki and Python message syntax.
  • Support for the {{GENDER:}} magic word, formatting of comma-separated lists, and hyperlinks in messages. (Also {{PLURAL:}}, but Jinja i18n has that too, see below.)
  • Jinja i18n seems to inherit several limitations from the underlying gettext library – though it also supports Babel as a backend, the gettext interface still seems to shape the library. (Disclaimer: I have not used Jinja i18n myself, this is just from reading the documentation and some guides to the library.)
    • It encourages using the English message text as the identifier of the message, which is generally a bad idea. In MediaWiki / translatewiki.net, each message has an identifier, and messages that happen to be identical in English can still be distinguished.
    • While I believe it supports full CLDR plural rules, the message specification syntax appears to encourage English-style pluralization with exactly two variants (“one” and “other” in CLDR speak). The Jinja {% pluralize %} syntax also seems to require all the non-plural-dependent text to be duplicated in a message (“you have one message {% pluralize %} you have {{ count }} messages”), whereas in MediaWiki syntax {{PLURAL:}} can be used anywhere within a message (“you have {{PLURAL:$1|one message|$1 messages}}”).
    • As far as I can tell, there is no provision for message documentation, or at least no strong tradition of providing documentation for messages. In MediaWiki / translatewiki.net, developers are expected to document every message in qqq.json, for the benefit of translators.

I don’t have any specific plans for when to do this, but if people are interested, let me know (ideally via a comment here) and that’ll bump it up in my priority list :)

CCing some people from T272243 who might be interested in this: @Nikerabbit @abi_ @Amire80

Event Timeline

It would be great if we (translatewiki) were able to suggest high quality i18n libraries for different programming languages.

I echo that having unique message keys and "inline" syntax for plural & co is desirable, as well as message documentation and safe-escaping by default to avoid HTML (and other) injection vectors. It's a plus if it has a nice syntax for links or generic escaping aware embedding feature.