Migrate i18n file format to JSON
Closed, ResolvedPublic
Actions

Description

We've recently started using JSON for i18n files in MediaWiki and for some other products. We think it's a good non-executable format for i18n/L10n.

It would be nice if Pywikibot would also start using this format. It would also allow us to remove the less than excellent Python file format support in the Translate extension.

Version: unspecified
Severity: critical
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=66897

Details

Reference: bz63327

	Subject	Repo	Branch	Lines +/-
	Use JSON for i18n files	pywikibot/core	master	+103 -119

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T72936 Important tasks to be solved (tracking)
Resolved	Ladsgroup	T65327 Migrate i18n file format to JSON
Resolved	jayvdb	T85335 Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n
Resolved	Ladsgroup	T85336 Missing attribution of messages in pywikibot/i18n JSON files.
Resolved	Ladsgroup	T87231 Add JSON support to make_i18n_dict.py

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 3:01 AM

• bzimport added projects: Pywikibot-i18n, I18n.

• bzimport set Reference to bz63327.

• bzimport added a subscriber: Unknown Object (????).

siebrand created this task.Mar 31 2014, 9:12 PM

One of the reasons we are using the current file format is the structure: in pywikibot, the typical use case is 'getting a few strings from a lot of languages', while the general use case (e.g. for mediawiki) is 'getting a lot of strings from a single language'.

On tools-login, parsing all mediawiki's json files takes 3-5 seconds. For many bot users, this might be a lot longer (slower computer, slower hard drive), which (I think) is not reasonable.

Switching to a json-based format with the same file structure would be OK with me.

It shouldn't matter too much whether N messages are in 50 or 1000 files (made up numbers) on the time how much it takes to parse them.

Actually it would, each new file introduces more disk read time, file open, file close time. Then depending on the language it may have to create a new json parser, parse the file, and then destroy the json parser. With 1-2 files its not that big of a deal, but as it scales up the issue becomes more and more of a bottle neck

I can understand that, but I'm not convinced it is a bottle neck currently.

I am weighing this against the development effort needed in Translate to support it.

(In reply to Niklas Laxström from comment #4)

I can understand that, but I'm not convinced it is a bottle neck currently.

As I mentioned, I *measured* the time needed to parse JSON files. For Mediawiki core, this takes *multiple seconds*. That's *multiple seconds* during *startup* of *each* script.

Currently, each script needs to read a single file, that's also in a format that's loaded quickly (the python bytecode is cached).

(In reply to Niklas Laxström from comment #4)

I am weighing this against the development effort needed in Translate to
support it.

I assume you are aware this switch won't exactly be free in terms of *our* time, either, right? I'd rather work on site-based configuration, or python 3 support, instead of changing a file format for the sake of changing it.

Are you saying that you parse *all* of *MediaWiki's* i18n files on *startup* for all pywikibot scripts?

No, I'm saying we would need to parse *all* of *pywikibot's* i18n files on *startup* of all scripts if we would use the json format currently used by mediawiki.

MediaWiki core has about 500000 strings (all languages together). Pywikibot has 11000 strings. Assuming parsing is linear on the number of strings, for Pywikibot your example would take 0.1 seconds. This leaves some room for growth, non-linear parsing time and slower machines.

I would also imagine that you could on startup just dump all the messages in an efficient serialized format provided by python, which is used on subsequent loads, if necessary. I might be able to help with that.

You're right, MW is much bigger. Could you provide the json formatted output for pywikibot somewhere?

(In reply to Merlijn van Deen from comment #9)

You're right, MW is much bigger. Could you provide the json formatted output
for pywikibot somewhere?

There isn't such a thing. There's a file format and a conversion script that does not yet exist. There are currently 41 distinct i18n components with 129 strings for Pywikibot.

Let's double that. 100 components and 300 strings in say 70 languages. That's 5000 files and 21000 strings.

MediaWiki core has 3000 strings for English alone in a single file. It's hard to compare. I'd suggest to create some dummy data based on my above assumptions.

Sample files are abundant in any MediaWiki extension in i18n/en.json.

Or you can use the following python script to create json files for testing:

import json
import glob
import importlib
import os

for file in glob.glob("*.py"):

module = file[:-3]
try:
  dict = importlib.import_module(module).msg
  os.mkdir(module)
  for lang in dict:
    file = open(module + "/" + lang + ".json", "w+")
    json.dump(dict[lang], file, ensure_ascii=False)
except AttributeError:
  pass

Change 151113 had a related patch set uploaded by Ladsgroup:
[BREAKING] [Bug 63327] Use JSON for i18n files

https://gerrit.wikimedia.org/r/151113

Change 151114 had a related patch set uploaded by Ladsgroup:
[BREAKING] [Bug 63327] Use JSON for i18n files

https://gerrit.wikimedia.org/r/151114

This needs to be done before the next version pushed to pypi , which would be another beta or a release candidate.

Change 151114 had a related patch set uploaded by John Vandenberg:
Use JSON for i18n files

https://gerrit.wikimedia.org/r/151114

jayvdb added projects: Pywikibot, Pywikibot-compat.Nov 30 2014, 2:37 PM

jayvdb set Security to None.

jayvdb moved this task from Backlog to Ready to go on the Pywikibot-compat board.

jayvdb moved this task from Backlog to Next main release on the Pywikibot board.

jayvdb removed a subscriber: Unknown Object (????).

Xqt closed subtask T85336: Missing attribution of messages in pywikibot/i18n JSON files. as Resolved.Jan 3 2015, 11:03 AM

Xqt closed subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Resolved.

jayvdb reopened subtask T85336: Missing attribution of messages in pywikibot/i18n JSON files. as Open.Jan 6 2015, 9:08 AM

Ladsgroup claimed this task.Mar 21 2015, 11:59 PM

Change 151114 had a related patch set uploaded (by Ladsgroup):
Use JSON for i18n files

https://gerrit.wikimedia.org/r/151114

Change 151114 had a related patch set uploaded (by Ladsgroup):
Use JSON for i18n files

https://gerrit.wikimedia.org/r/151114

jayvdb reopened subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Open.Mar 25 2015, 1:09 AM

Ladsgroup closed subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Resolved.Mar 25 2015, 4:04 AM

jayvdb reopened subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Open.Apr 3 2015, 3:31 AM

Ladsgroup closed subtask T87231: Add JSON support to make_i18n_dict.py as Resolved.Apr 6 2015, 7:03 AM

Krinkle closed subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Declined.Apr 13 2015, 4:16 PM

Krinkle changed the status of subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n from Declined to Resolved.

jayvdb reopened subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Open.Apr 13 2015, 7:24 PM

Ladsgroup closed subtask T85336: Missing attribution of messages in pywikibot/i18n JSON files. as Resolved.May 23 2015, 2:09 PM

Change 151114 merged by jenkins-bot:
Use JSON for i18n files

https://gerrit.wikimedia.org/r/151114

jayvdb mentioned this in rPWBC5767d73dc844: Use JSON for i18n files.May 24 2015, 12:07 PM

Ladsgroup closed this task as Resolved.May 24 2015, 2:46 PM

Ladsgroup edited projects, added Wikimedia-Hackathon-2015; removed Patch-For-Review.

jayvdb closed subtask T85335: Jenkins job to validate JSON files submitted to Gerrit repo pywikibot/i18n as Resolved.Aug 31 2015, 4:45 AM

• Phabricator_maintenance added a project: User-Ladsgroup.Aug 12 2016, 8:10 PM

Migrate i18n file format to JSONClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Migrate i18n file format to JSON
Closed, ResolvedPublic
Actions

Related Objects
Search...