Page MenuHomePhabricator

Tools for mass migration of legacy translated wiki content
Open, LowestPublic

Description

Proposal by the Language team at http://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects#Tool_for_mass_i18n_of_wiki_content

The MediaWiki Translate extension has a page translation feature to make the life of translators easier. But often wikis have a lot of legacy content that requires tedious manual conversion to make it translatable. It would be useful to have a tool to facilitate the conversion.


Version: unspecified
Severity: enhancement

Details

Reference
bz46645

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedNone
ResolvedNikerabbit
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
OpenNone
ResolvedNone
ResolvedNone
ResolvedNone
OpenNone
ResolvedNone
OpenNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
OpenNone
ResolvedNikerabbit
ResolvedNone
OpenNone
ResolvedNone
ResolvedNone
OpenNone
DeclinedNemo_bis
ResolvedNemo_bis
ResolvedNemo_bis
OpenNone
OpenNone
ResolvedNikerabbit
OpenNone
DeclinedNone
OpenNone
OpenNone
OpenNone
ResolvedNone
OpenNone

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 1:25 AM
bzimport set Reference to bz46645.
bzimport added a subscriber: Unknown Object (MLST).
Qgil created this task.Mar 28 2013, 5:56 PM

I wrote a quick and dirty Python script last year to convert translated pages on Meta from the old system to the new system. See it in action e.g. here:
https://meta.wikimedia.org/w/index.php?title=Special:Contributions/HaeBot&offset=20120820000000&limit=500&target=HaeBot (If gettext import is enabled for the Translate extension per https://bugzilla.wikimedia.org/show_bug.cgi?id=40341 , this will eliminate the need to use a bot for writing the converted translations into the new format unit by unit.)

The conversion script also supports partial import and rearranging, in case only parts of the old translation are to be reused. It worked for me and eventually saved a lot of time compared to manual convertion, but it has to be said that the regexes that split up the old translations into translation units do need some manual preparation, and (in my case) then needed to be tweaked separately for quite a few languages because the translators didn't quite preserve the format of the English original.

Sadly, I missed the GSoC application deadline ;) and I haven't gotten around to publish the code yet, as it still needs some cleanup, but until then I'm offering to send the unpolished code to anyone who could find it useful. See also https://meta.wikimedia.org/wiki/User_talk:HaeBot#Code

Qgil added a comment.Mar 13 2014, 2:34 PM

According to https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools and https://www.mediawiki.org/wiki/Google_Summer_of_Code_2014 , Pratik Lahoti is working on a proposal.

Pratik, your proposal is still missing in Google Melange. Please submit it there as a draft linking to your wiki page. In any case, we will evaluate your proposal in mediawiki.org. Thank you!

pr4tiklahoti wrote:

Yes, I was going to comment on this bug. I am working on the proposal as mentioned by Quim above.

@Quim: I have now submitted the proposal in Google Melange as well. Thanks.

A usable version of Special:PageMigration is already available, please test it on http://pagemigration.wmflabs.org/ . For more information:
https://www.mediawiki.org/wiki/Extension:Translate/Mass_migration_tools#Bug_on_Bugzilla

Copying here from https://www.mediawiki.org/wiki/Extension_talk:Translate/Mass_migration_tools#Approaching_conclusion.2C_feast_now :


It will be a tough ride to finish the project (if at all possible), but I highly recommend to use Special:PageMigration now: just today, with its help I managed to make almost a thousand edits and migrate several pages including some huge ones. Even in big pages it often works surprisingly well (almost nothing to do manually), while in others it fails spectacularly (but we have some ideas on how to fix it).

With only ten days of coding left, more than ever it's useful if you use the tool on at least one page and comment on the bugs to help us prioritise, or even better report new problems/ideas we've not identified yet.


https://bugzilla.wikimedia.org/buglist.cgi?component=Translate&f1=blocked&o1=substring&order=priority%2Cbug_severity%2Cvotes%20DESC&query_format=advanced&resolution=---&v1=65740

al added a subscriber: al.Dec 20 2015, 2:40 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 20 2015, 2:40 AM