Page MenuHomePhabricator

Extensive and robust localization file format coverage for Translate extension
Closed, DuplicatePublic

Description

Title: Extensive and robust localization file format coverage for Translate extension

User Information:

Name: Ayushi Mrigen
Email id: ayushimrigen11@gmail.com
IRC handle: ayushi
Mediawiki/Phabricator username: ayushimrigen
Location: Kharagpur, India, UTC+5:30
Typical working hours: 10 A.M. IST to 8 P.M. IST (Till july 11); 3 P.M. IST to 12 midnight IST (After July 11)

Abstract:

Mediawiki Extension:Translate is a very powerful tool to translate all kinds of text these days. It is most widely used in the translation of various kinds of software, and the management of various wikis. One of the major aims of the Translate extension is to make the back-end integration with the actual source code as easy as possible. As of now, the Translate extension of Mediawiki supports a number of file formats. However, a lot of these file formats were developed as and when required. As a result, many file formats are not supported or the support is still incomplete. The code is not secure in parts, and a lot of bugs have been reported and stay pending.

Synopsis:

The primary aim of this project would be to make the existing file formats more robust in order to satisfy the following properties:
• The code does not crash on unexpected input,
• There is a validator for the file format,
• The code can handle the full file format specification,
• The code is secure (does not execute any code in the files nor have known exploits).
In addition to this, new file formats are to be added. For Example, a basic setup for Apache Cocoon has been set up, but a lot of work needs to be done.
The final aim of this project will be to find loopholes in the various existing file formats and fill them. Also, the latter half of the project will be devoted to adding at least one more file format support completely.
Possible mentors:

Niklas Laxström
Federico Leva
Siebrand Mazeland

Deliverables:
Improving existing file formats:
JavaScripFFS:This file format uses a number of workarounds and hacks, which make the code vulnerable to crashing. For example, on trying to concatenate strings, it does not take into account the single quotes, and explodes only if the string contains the format ‘+ “’ or ‘+”’ (Around line 100 in the code). Hence, the task to be performed here is to replace all these fragile hacks by proper JSON. With my mentor’s support, I will go through the entire code and replace most functions with JSON functions, in order to make the code more robust and secure.
Task to be perfomed:T33331: JavascriptFFS shouldn't be building JSON manually

AndroidFFS:Some of the problems faced by this file format and the ways I plan to tackle them are:
• Android exporter causes build failure due to presence of ‘%’ in the generated xml. This is because some of the translated characters contains % while encoding them. Right now, the workaround which is implemented is that we convert the codes containing % with their original characters. However, the proper fix to this bug would be to ensure that those areas of the string are left unformatted, or to add formatted="no" attribute to the <string> entry.
• Android FFS doesn't support string-arrays. In the event that android XML resources contain a series of strings which are translatable, they are simply skipped and not translated. As a casual fix to this, we have an existing patch. I plan to retrieve the patch and add string array translation support in Android FFS. A number of file formats right now support the translation of a series of strings. Taking help from the existing codes, adding support to a new file format will be easy, and will solve a number of problems, and open issues like the one in Project MAXS.
• For a few tags, the html content is simply omitted if contained in a message.
For example:
Actual English content: ‘Open Source software released under the <a href="https://github.com/wikimedia/android-commons/blob/master/COPYING">GNU GPLv2</a>’
Translated content: ‘Open Source software released under the’
Instead of using vulnerable hacks, I will try to fix this by adding support to tags like <b>, <i>, <u> and <a>.
Tasks to be performed:
T58276: Android FFS doesn't support string-array
T51412: Android exporter causes build failure due to presence of ‘%’ in the generated xml
T49310: Translate should not eat some tags inside message contents for AndroidFFS
T67137: Add author attribution to AndroidXmlFFS

GettextFFS:
One of the visible bugs here is that the files are translated completely all the time. Contents, including the timestamp are changed, which makes the change set very big. Hence, I will go through the existing code and try to reduce the change set, in order to minimize errors and make the code secure. Also, there are quite a few other reported small bugs.
Tasks to be performed:
T67194: Gettext files should not be written if only meta data has changed
T40479: GettextFFS plurals break when translation contains |
T59964: Importing an invalid gettext file causes "Fatal exception of type MWException"

Other existing file formats:
Similarly, most other file formats have a number of workarounds which may or may not work in all conditions. Also, a number of bugs that have been reported in a particular file format may be existent in others as well but not reported yet. Our aim in this project will be to carry out extensive testing with each file format and fix all these errors using proper functions and avoid using hacks wherever possible.

Additional tasks to be performed:
T42712: JavaFFS doesn't parse fuzzy tags from source files
T39168: Translate extension should not reparse definition files on each message edit
T33300: SimpleFFS should handle escaped delimiters in message keys
T69636: Add plural (.stringsdict) support to AppleFFS for iOS/Mac OS X translation files
End result to the user: The user will be able to use the existing file formats with much lesser chances of the code crashing, or him getting unexpected outputs. I will go through each existing FFS file and improve on the code, making it robust and secure for the user.

Adding new file format supports:
After gaining the basic idea of how file formats should be designed to keeping the code robust, I will focus on implementing new file formats in the latter half of the project time. Atleast one file format, i.e. Apache Cocoon will be added as an FFS before the end of the project time. Apart from that, there are a number of file formats which other platforms support, like, OpenOffice.org SDF/GSI, Desktop, Joomla INI, Magento CSV, Maker Interchange Format (MIF), .plist, Qt Linguist (TS), Subtitle formats, Windows (.rc), Windows resource (.resx), HTML/XHTML, Mac OS X strings, WordFast TXT, ical etc. Depending on how the project progresses, I will also try to implement a few more of these in the given time.

Future Prospects:
Although this project mainly focuses on the number of bug fixes, which have been identified in the process of users using the extension for a long period of time, it can play an important role in the future of File Format Supports in the Translate extension. The key aims in the future in this direction are to add the ability to detect the file format automatically, increasing the number of supported software projects and the extension of the ability to add files for translation by normal users via a web interface. The major hurdle for all this happening is the way in which most of these codes are written. Once all the loopholes are fixed, and a consistency is introduced among all the FFSs, it will be easier for any novice to add a new file format following the similar pattern. It will also become easier to introduce the feature of automatic file format detection.

Detailed Timeline:

Since I have completed the setup of the environment for development and for testing, I will start immediately with getting my hands dirty with the code.

Before the project begins: Get to know the mentors. Read the codes of each file format and understand it perfectly |
May 25 – June 7: Fix all the bugs related to AndroidFFS.
June 8 - June 15: Change the code of JavaScriptFFS wherever required, in consultation with the mentor. Test the code for all possible failures.
June 16 - June 27: Create a priority list of the other bugs with the mentor and fix all such related bugs.
June 28 - July 3: Vigorous testing to search for other bugs which haven’t been reported yet. Consultation with the mentor and choose the bugs that need to be fixed immediately.
July 4 - July 12: Fix all the newly found bugs and general cleaning up of the code.
July 13 - July 31: Code for the addition of a new file format, Apache Cocoon.
August 1 - August 7: Testing all the features and gathering potential bugs for the new file format.
August 10 - August 16: Improving the documentation and general cleanup of the code.
August 17:Pencils down.

Participation

Having interacted with the Wikimedia community for a while now, and being active on the irc channels, I have clearly understood the importance of community participation here. Hence, I plan to do the following to ensure help form the entire community during the project:
• I will try to hold most meetings with my mentor on the irc channel (MediaWiki-Internationalization), so that other senior contributors may also be able to add their insights.
• I will upload all patches on Gerrit as soon as I am done with them, so that I get constant feedback from those in the community before they get merged.
• According to the feedback received, I will check and resubmit patches, until we get the desired result.
Apart from this, I will maintain a blog, which will be updated at the end of each week. I will write about the progress made, and my experience every week in the blog.
Also, I am already a part of the mailing lists. Although I prefer irc, I will also use the mailing lists to communicate with the community whenever the need be.

About Me

Education: I am a Second Year Undergraduate student of the Department of Mathematics, IIT Kharagpur, pursuing its integrated M.Sc. course of Mathematics and Computing.
How did you hear about this program?
I have been an avid Wikipedia user for a long time now. When I decided to get into the world of Open Source development, mediawiki was my first choice. I got to know about this project on the Possible Tech projects board on Phabricator.
Other Commitments: No, I do not have any other commitments during the project time. Since my college will be on vacation for a large part of the project period, I will be absolutely free during the time. I will be able to contribute about 50 hours of time every week. Even when classes resume in July first week, I will have about 20 hours of classes including labs, which will make it perfectly easy to contribute the same amount of time as before to the project.
We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
Yes, I am applying for both Google Summer of Code and Outreachy, since I am eligible for both. Apart from Wikimedia, I have also written an android application based project proposal for Buildmlearn for GSOC. However, I am much more inclined towards working on this project with Wikimedia, since this has a far greater scope of learning and contribution.
We don't just care about your project -- you are a person, and that matters to us! What drives you? What makes you want to make this the most awesomest wiki enhancement ever?
The major driving factor for me to work with Wikimedia is the way the entire community works together to make something that is being used by such a huge population of people worldwide. From the beginning of my association with mediawiki, the fact that so many hands have been ready to help me at every step impressed me a lot. Also, the sheer magnitude of the people using mediawiki, and the impact which every improvement will have is amazing. A prospect of creating such a huge impact is what drove me towards the project. Infact, I plan to continue contributing to the Wikimedia community even after the project tenure is over.

Past Experience
• I have been an active android developer. The two of my major android applications, which have been very well received on the app store, are Math Bomb and Debate Assistant. Here, I have worked extensively on the android-xml format.
• I have been a part of the software team of the Project Autonomous Ground Vehicle,AGV. Here, I learnt a lot of image processing on OpenCV using ROS. I was also a part of the team that participated in IGVC’14.
• In the FOSS community, I have been using Ubuntu, Firefox etc. for a long time now. With so many Open Source Projects around, I have been interested contributing to the same for a very long time.
• Also, I have setup mediawiki on the vagrant machine in my system and my contributions to the project can be found here.
Microtasks

-[[ https://gerrit.wikimedia.org/r/#/c/199590/ | Improving concatenation of strings in JavaScriptFFS ]]
-[[ https://gerrit.wikimedia.org/r/#/c/198730/ | Add styling for previous/next in SearchTranslations ]]

Event Timeline

Ayushimrigen raised the priority of this task from to Medium.
Ayushimrigen updated the task description. (Show Details)
Ayushimrigen subscribed.

@Ayushimrigen, it'd be great if you could work on some more microtasks for the project.

@NiharikaKohli , Also, I have submitted this proposal for Outreachy too. So it would be great if it was moved to Outreachy: Proposals submitted workboard to be reviewed there too.

Thanks for sending your second patch to gerrit. You said you also had a third one ready, do you manage to upload that one too?

Please edit the description to fix the typos in your proposal.

@Nemo_bis: Sorry was out of internet connection for the long weekend. There were a few issues with my third patch, which I'll fix and upload soon.