Page MenuHomePhabricator

Multilingual SemanticMediaWiki
Closed, DuplicatePublic

Description

Profile Information

Name: Abhishek Mittal
Email: abhishekmittaliiit@gmail.com
IM networks/handle(s): darkdragon09
Web Page: https://web.iiit.ac.in/~abhishek.mittal/
Portfolio: http://54.148.146.38/
Resume: https://web.iiit.ac.in/~abhishek.mittal/abhi_resume.pdf
Location: Hyderabad (Telangana), India
Typical working hours: 40 hours/week

Possible Mentors

@Yaron_Koren, @Nemo_bis, @Nikerabbit

Introduction

Hi, I am Abhishek Mittal. Currently, I am in 4th year, pursuing B.Tech in CSE from International Institute of Information Technology Hyderabad. I would like to spend this summer working as part of MediaWiki Development team and contribute my bit in the development of the project.

Synopsis

The aim of the project is to add multilingual-capabilities to Semantic MediaWiki, such that it supports various languages out of the box, thus removing the overhead of configurations that are required to make MediaWiki multilingual. Thus the new Semantic MediaWiki would be now MultiLingual Semantic MediaWiki as demonstrated in the following links :-

  1. http://tieteentermipankki.fi/
  2. http://sanat.csc.fi/

This would make the process of creating new multilingual semantic Wiki’s faster and easier and also make Wiki’s more easily reachable to people of different language groups.

Deliverables

This project has a number of intricacies involved and the completion of the project in the given time seems to be a bit difficult. Therefore, my major focus would be deliver and develop modules that are easily integrable with the SMW core. The goal of the project is to develop a technique such that properties can be made multilingual. The idea suggested by @mwjames is to introduce a monolingual texttype which would allow to annotate a property with different labels in different languages such as [[Has property label::en:Foo]] [[Has property label::ja:テスト]] etc., which happens entirely within the property namespace. Here "Has property label" is the monolingual texttype used for annotations.

Now a query can be created to lookup the correct label for a property using PropertyLabelLanguageFallbackLookup which is injected into the PropertyLabelFinder.

Since all Phases are independent, my main focus would be Phase 3 of the project and would spend considerable time finishing it.

Key Deliverables

  • Code for a monolingual texttype which would allow to annotate a property with different labels
  • Documentation of technique used
  • Demonstration video of the multilingual capabilities of the project

Detailed Project Description and Plan of Action

In order to make SMW multilingual, we need to do the following steps :-

Phase 1. Enhance Features

Enhance Special:CreateForm and friends (all the Special:Create* special pages by Semantic Forms) to create forms that are already i18n -ed (localized), with placeholders and message group for the Translate extension.

The main components of Semantic Forms functionality are form definition pages, which exist in a new namespace, 'Form:'. These are pages consisting of markup code which gets parsed when a user goes to add or edit data. Since forms are defined strictly through these definition pages, users can themselves create and edit forms, without the need for any actual programming.

The Semantic Forms extension enforces the use of templates in creating semantic data. It does not support direct semantic markup in data pages; instead, all the semantic markup is meant to be stored indirectly through templates. A form allows a user to populate a pre-defined set of templates and sections for a page (behind the scenes, the template data is turned into semantic properties once the page is saved).

Subtask 1. We need to extend semantic forms, by extending the templates, such that option to translate property names is provided to the users creating the form. A separate module that provides the interface, perhaps a button, that can be put against each property name, can be written in order to make the translation process easier. This button would work similarly as the translate link works at the top of page.

Subtask 2. Send the required information to the translation extension, that provides the interface for translation

Phase 2. Make possible to send strings for translation to Translate message groups

Make it possible to define translation for properties and create a message group for Translate extension, similar to what CentralNotice does (sending strings for translation to Translate message groups).

In order to achieve goals of this phase, we need to do the following :-

  • Explicitly tie a language dependent label to property key (a user-property created in a specific language) where the label is being fed as alias to a property key (the property-key stays as unique identifier).
  • Introduce a monolingual texttype which would allow to annotate a property with different labels in different languages such as [[Has property label::en:Foo]] [[Has property label::ja:テスト]] etc., this happens entirely within the property namespace
  • Create a query to lookup the correct label for a property using PropertyLabelLanguageFallbackLookup which is injected into the PropertyLabelFinder.
  • Replace the text (property key) within a query based on a language key (or {{int}}) will not work as the identifier (property key/label) the value was stored needs to be used. The text presentation of a property can only be replaced after a query result has been generated.

Translate extension would provide the translation interface. SMW would register a message group(s) to Translate. One of these groups would contain all the *name* of the properties known to exist in the Wiki.

Phase 3. Handle translations of properties where ever used

There are lot of places where properties are displayed: many special pages, queries, property pages. We need to find out a sensible way to handle translations on all these places. In most wikis, properties names are supposed to be hidden to the user, e.g. queries results are usually shown in infobox-like templates (whose labels could in theory be localised as all templates).

This part would be difficult to implement, but by starting things one at a time we can achieve some considerable goals.

Translate would be fed with the strings in need of translation. Localised strings/messages would be displayed based on the interface language, that in core every user can set on Special:Preferences and with ULS is made way easier to pick for everyone including unregistered users.

Phase 4. Fix Issues that prevent full Localisation

Fixing the issues that prevent full localisation of Semantic Forms (T49736: Unable to use {{int}} inside {{for template}}).

This is important as Localisation is the first step to enable multilingual translations. All translation of MediaWiki user interface messages go through translatewiki.net and not committed directly to code. Only the English messages and their initial documentation must be done in the source code. This process is Localisation. Once the messages and documentation are available in English then they are translated.
Therefore, fixing the existing issues that prevent full localisation of Semantic Forms becomes must, to add multilingual capabilities to SMW.

Major Hurdles

The hardest part of the project is to use the translations for properties in various places. Most cases are probably easy to change, but there can be a lot of them and some of them might be difficult because of caching or other issues.

Project Timeline

  • Before 25th May :- Design a comprehensive implementation strategy for the project and get a better understanding of the project.
  • 25th May - 5th June :- Internship Begins, Start Phase One - Subtask 1 of the project. Integrate interface for translation of properties in semantic forms.
  • 5th June - 15th June :- Phase One - Subtask 2. Link the interface with the translation extension.
  • 15th June - 25th June :- Phase Three. Start this subtask and understand the design strategy
  • 25th June - 2nd July :- - 3rd July :- Midterm Evaluations
  • 4th July - 20th July :- Continue Phase 3 of the project.
  • 20th July - 5th Aug :- Phase Three of the project. Handle translations of properties where ever used
  • 6th Aug - 10th Aug :- Phase Four of the project.
  • 11th Aug - 15th Aug :- Testing and Bug Fixing
  • 15th Aug - 22nd Aug :- Documentation Related Stuff

Further Questions and more about myself

Q. Why you’d like to execute on this particular project and the reason you’re the best individual to do so.

I think this project has a great scope for learning. This project has been proposed for GSOC twice before, but has yet to be tried. The project appears to quite challenging and unique, and if deployed the project can be integrated easily with the existing Semantic Wiki’s as well, which need translation for properties. It makes the process of making providing multilingual capabilities to Semantic Wiki easy and appears to a great opportunity to learn and enhance my development skills.

I think that I am the best individual to do so because of my past experience of Web Development and a never ending thirst to explore and try new stuff. Moreover this project needs triage, and requires a good amount of exploring for completion and I don’t easily give up on projects once started.

Hits and Trials since the last month

In order to get hold and better understanding of this project, I tried to solve the following bugs :-

  1. T49510: SF & translate extension : "Save page" button does nothing (saves nothing) :- This bug appeared resolved in the new versions of MediaWiki. Things were working fine when I tried to reproduce the bug.
  1. T85924: SVG upload should have more specific error (warning) message when blocking :- Submitted a patch for resolving the bug, the patch is under review.
  1. T48995: Username validation message does not describe failure reason, only "You have not specified a valid username" :- Currently I am working on resolving this issue.

Public Contributions/Development Experience

  1. Codemirror (online text editor for programming languages, open source project) :- Reported and Submitted a patch for bug fix to Codemirror.

    Pull Request :- https://github.com/codemirror/CodeMirror/pull/2663 Bug Report :- https://github.com/codemirror/CodeMirror/issues/2662
  1. OmniSharp (autocomplete demon for C#, open source project) :- Reported and Submitted a patch for bug fix to Omnisharp.

    Commit :- https://github.com/OmniSharp/omnisharp-server/commits?author=abhishekmittal09
  1. Mediawiki (engine that supports wikipedia, open source project) :- Submitted a patch fix for Bug in mediawiki, backend engine hosting wikipedia.

    Link for bug fix :- https://gerrit.wikimedia.org/r/194466 Bug :- https://phabricator.wikimedia.org/T85924

Q. How my progress can be tracked?

I plan to write weekly blogs to update my status and progress made each week. The blog will contain the details of the hurdles that I experienced and the method used to tackle them.

Q. Do you have other obligations from late May to early August (school, work, etc.)?

No I donot have any other obligations besides pursuing internship during summers. I’ll try my best to finish this project in time.

Event Timeline

Phase 3. Make possible to send strings for translation to Translate message groups
Subtask 1. The translated text should go into a separate blob table with a property tracking id, language key, and its translation.
Subtask 2. Implement some setter/getters functions to catch the representation based on the language key returning a Message object.

I'm not sure about this proposal as it would require SMW's datamodel to recognize text information from an outside source (arbitrary translate extension/blob table) and it would create an issue when exporting labels/keys to SPARQLStore/RDF.

We will probably need a different approach which explicitly ties a language dependent label to its property key (a user-property created in a specific language) where the label is being fed [0] as alias to a property key (the property-key stays as unique identifier).

We have to introduce a monolingual texttype which would allow to annotate a property with different labels in different languages such as [[Has property label::en:Foo]] [[Has property label::ja:テスト]] etc. but this happens entirely within the property namespace (so a query can be created to lookup the correct label for a property using PropertyLabelLanguageFallbackLookup which is injected into the PropertyLabelFinder ).

Replacing a text (property key) within a query based on a language key (or {{int}}) will not work as the identifier (property key/label) the value was stored needs to be used. The text presentation of a property can only be replaced after a query result has been generated where [1] provides the correct textual context and the same goes for a property label text outside of a query context (e.g. Special:Browse etc.).

[0] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/src/PropertyLabelFinder.php

[1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/blob/master/includes/datavalues/SMW_DV_Property.php#L196

I understand your concerns. I went through the current datamodel details of SMW and the method suggested by you for implementation appears to be feasible. I'll look into the implementation strategy for the same and update the proposal.

For BFT (tieteentermipankki.fi), we are quite succesfully using approach of multiple fields like name-fi, name-en, name-sv.

Let's say there is a field for language with allowed values fi, en, sv. Now, when editing a page with such field, it would be nice if the dropdown displayed Finnish, English, Swedish in the user's language while storing one of fi, en, sv. This should be relatively easy to do.

When displaying a such page, it is possible to convert the values to the display names with something like {{int:prefix-languagename-{{{language}}}}}. Perhaps this could be automated somehow?

When querying such data, doing that needs some hackery with template formatting encapsulating the above. Could this be made easier?

When creating a form, the labels for the fields are hardcoded into one language. One has to manually replace them with {{int:prefix-fieldname-desc}}, and follow the steps to create a message group. Could this be automated? Can we make it easier to update a such form, because the ability to change forms on the fly is one of the selling points of SMW.

These are some use cases in response to @mwjames's thoughts. Solving these things will immediately help to make these wikis more multilingual, without needing to go into the deep machinery of SMW to teach it about multilingual properties. Of course, that direction could be a better solution in the long term, but we also need to scope this project so that it can produce usable results within the limited time.

Darkdragon09, do you plan to followup on the review you got for your open patch, and/or to complete another microtask more closely related to your project during this application phase?

The main secret of a successful GSoC project is the ability to split the project in small actionable tasks. As you improve your application, you should be able to define a first small step to go in the direction you propose, and send a patch for it.

@Nemo_bis, I'll be submitting another patch for the bug by monday. Currently I was working on understanding the working of tieteentermipankki.fi, and develop a prototype as suggested by @Nikerabbit

Let's say there is a field for language with allowed values fi, en, sv. Now, when editing a page with such field, it would be nice if the dropdown displayed Finnish, English, Swedish in the user's language while storing one of fi, en, sv. This should be relatively easy to do.

When displaying a such page, it is possible to convert the values to the display names with something like {{int:prefix-languagename-{{{language}}}}}. Perhaps this could be automated somehow?

I would be able to make some progress in this direction by next week. My main goal is to understand the overall complexity of the project from this and code structure of SMW.