Page MenuHomePhabricator

Deploy Content Translation tool to CEE language Wikipedias for regional project this spring
Closed, ResolvedPublic1 Story Points

Description

The Central and Eastern Europeans are planning a 3-month regional editing campaign which will begin in March. They are calling it "CEE Spring," in honor of the Arab Spring.

Here is the project portal on Meta wiki: https://meta.wikimedia.org/wiki/Wikimedia_CEE_Spring_2015.

During this campaign, volunteers will be creating and improving articles about every country in the region on every Wikipedia in the region. Other Wikimedia projects, including Commons and Wikidata, may also be involved.

Volunteers from each country will create a list of priority articles to be translated. Those lists are linked here. (Click the links on the flags.)

https://meta.wikimedia.org/wiki/Wikimedia_CEE_Spring_2015/Structure

The goals of the project are to:
a. Increase quantity and quality of Wikimedia content on numerous projects and in numerous languages.
b. Increase participation and local outreach levels with an attractive international event.
c. Support further collaborative initiatives in the region.

Is it possible to deploy Content Translation to any/some/most/all of the participating CEE languages and other languages which could potentially be impacted by this project -- if there are any CEE language communities that express interest in enabling this beta feature on their project?

AR
AZ
BE
BE-X-OLD
BG
BS
CSB
CZ
DE
EL
EO
ET
FI
HU
HY
KA
KK
LT
LV
MK
MO
PL
RO
RU
RUE
SH
SK
SL
SQ
SR
TR
UK

Thank you! :)

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ijon added a subscriber: Ijon.Jan 30 2015, 1:32 AM
geraki added a subscriber: geraki.Jan 30 2015, 6:38 AM
KuboF added a subscriber: KuboF.Jan 30 2015, 10:58 AM

And Esperanto please (both from and to), as there is strong Esperanto wiki community in CEE region and Esperanto is on the list of non-material cultural heritage of Poland, one of CEE countries.

saper added a subscriber: saper.Jan 30 2015, 11:12 AM

It would be possible to use a libre statistical MT Moses for lang pair Slovak-Czech. I am in contact with Slovakian Linguistic Institute and (based on unofficially agreements) they can provide it for us. Similar could work for some another lang pairs in CEE region thanks to http://www.euromatrixplus.eu

For this, support for Moses in CX would be needed.

And Esperanto please (both from and to), as there is strong Esperanto wiki community in CEE region and Esperanto is on the list of non-material cultural heritage of Poland, one of CEE countries.

Thanks for catching that, Kubof. Apologies for the omission.

Just a quick note on "related languages": the easiest way would be to group the Slavic languages by their branches: Western Slavic (PL, SK, CZ, CSB), Eastern (BE, BE-X-OLD, UK, RU) and Southern (BS, SL, SH and so on). However, all Slavic languages are mutually intelligible to certain extent and the only major problem is transliteration to and from cyrillic (in case of, say, RU-PL translation). Other than that, we could definitely benefit from some machine-assisted translation tool. Let me know how can I help.

tarlocesilion added a comment.EditedFeb 2 2015, 11:10 AM

This geographic/alphabetic list is an unnatural mix of such a variety of systems... First of all, we can group the Slavic and Baltic languages, 4 branches: Western Slavic (PL, SK, CZ, CSB), Eastern Slavic (BE, BE-X-OLD, UK, RU, RUE), Southern Slavic (BG, BS, MK, SH, SL, SR) and Baltic (LT, LV). From rest, we can separate a few groups (e.g. MO & RO, ET & FI).

Next, I have no idea why several languages are surprisingly listed (e.g. DE, EL, EO). I mean, I know that's because of CEE, but when talking about translation tool, so languages basically, geographic/WMF criteria are inappropriate. EO was invented in Poland, but isn't a Slavic language. Germany borders with Poland and Czech Republic, but DE is the only Germanic language listed here.

Considering the fact that several languages are unique or isolated from their branches and/or have no big/active community with no big wiki, and considering the fact that it's good to start with big-smaller groups of wiki (as it happened with ES-PT-CA), I suggest to deploy RU-UK-BE-[BE-X-OLD] and PL-CZ-SK first.

+1 to that. The basic problem with machine-assisted translations is that current systems too often employ English as a middleman (translate from Czech to English, and then from English to Polish). Such translations are mostly useless and even if the resulting text resembles the target language, it is often filled with erroneous statements resulting from such Chinese whispers. A prime example of that is Google translate: it works pretty well for translations within the Germanic group of languages (i.e. between Dutch and English), but is a waste of time in case of most other languages and at best gives some approximation of what the original text is about. We'd have to check if this tool works any better.

However, even if it is equally unusable as Google Translate, we could still benefit from it, as it allows for easier "translation" of article structure, references and images - and copying them is a major pain in the back when translating cross-wiki. So I would consider it a valuable addition even if we had to turn the translation part off completely.

Amire80 triaged this task as High priority.Feb 4 2015, 12:01 AM
Amire80 added a subscriber: Amire80.

(Setting High priority because it has good potential to increase the user base, and because the event is happening soon. Nevertheless, proper testing and outreach must be done for each language and wiki.)

(Setting High priority because it has good potential to increase the user base, and because the event is happening soon. Nevertheless, proper testing and outreach must be done for each language and wiki.)

Arrbee added a project: LE-Sprint-82.
Arrbee added a subscriber: Arrbee.

And Esperanto please (both from and to), as there is strong Esperanto wiki community in CEE region and Esperanto is on the list of non-material cultural heritage of Poland, one of CEE countries.

We already have English-Esperanto available with Apertium. Can you suggest the languages that will benefit with Esperanto as the source? Thanks.

Arrbee claimed this task.Feb 11 2015, 6:52 AM
Arrbee moved this task from Backlog to In Progress on the LE-Sprint-82 board.

Hello, we need a little help to evaluate the quality of the machine translated content (on Apertium) for the following languages:

  1. Bulgarian <-> Macedonian (both directions)
  2. Serbo-Croatioan <-> Macedonian (both directions)
  3. Serbo-Croatioan <-> Slovenian (both directions)
  4. English -> Serbo-Croatian (one direction)

Please follow this link to the survey form for instructions and to let us know your feedback. Thanks.

KuboF added a comment.Feb 17 2015, 1:39 PM

And Esperanto please (both from and to), as there is strong Esperanto wiki community in CEE region and Esperanto is on the list of non-material cultural heritage of Poland, one of CEE countries.

We already have English-Esperanto available with Apertium. Can you suggest the languages that will benefit with Esperanto as the source? Thanks.

Certainly Slovak and maybe Polish (I do not have overview of plwiki) because in 2016 there will be big Esperanto meetings in Slovakia and Poland and at least the Slovak team (I am member of it) will use Wikipedias to inform locals about the whole topic.

We already have English-Esperanto available with Apertium. Can you suggest the languages that will benefit with Esperanto as the source? Thanks.

Certainly Slovak and maybe Polish (I do not have overview of plwiki) because in 2016 there will be big Esperanto meetings in Slovakia and Poland and at least the Slovak team (I am member of it) will use Wikipedias to inform locals about the whole topic.

Thanks so much @KuboF. Would you by any chance know if the plwiki community have had any discussions about Content Translation? I can check separately, but just asking in case you knew. Thanks.

Halibutt added a comment.EditedFeb 18 2015, 11:19 PM

Thanks so much @KuboF. Would you by any chance know if the plwiki community have had any discussions about Content Translation? I can check separately, but just asking in case you knew. Thanks.

Yeah, we're monitoring this issue at pl wikipedia. So far I had a short conversation with User:Polimerek (head of Wikimedia Poland) about it (he was a little skeptical, but suggested that even if translation part was faulty, the tool could still be useful), I've also notified User:Tar_Lócesilion, head of our R&D group (he joined this board, not sure if he follows it closely, but I can always ping him).

In any way, there's plenty of hopes in the Polish wiki community, and plenty of anxiousness. Why? Check my comment at the top of this board.

Yeah, we're monitoring this issue at pl wikipedia. So far I had a short conversation with User:Polimerek (head of Wikimedia Poland) about it (he was a little skeptical, but suggested that even if translation part was faulty, the tool could still be useful), I've also notified User:Tar_Lócesilion, head of our R&D group (he joined this board, not sure if he follows it closely, but I can always ping him).

@Halibutt, just click on 'Subscribers' :)

Thanks so much KuboF. Would you by any chance know if the plwiki community have had any discussions about Content Translation? I can check separately, but just asking in case you knew. Thanks.

A few of us is aware of this topic. Polimerek mailed on WMCEE-l, Halibutt and I follow this board (even write, see older changes). I talked to some guys on R&D meeting, too. But we are not involved in this topic enough, we don't know too much about usability of CT while working with Polish language (e.g. the middleman problem), our community hasn't discussed yet. We aren't allowed to declare anything until we figure things out and discuss within the community. And then there'll be a decision.

@Halibutt @tarlocesilion Thank you. This is very helpful. Unfortunately, Polish is not supported through Apertium presently and due to this the benefit of machine translation will be lacking for Polish users. However, the tool is useful without MT as well and if it helps for you to assess how good/bad the usability is in its current state, we will be happy to set up Polish on the beta environment for testing.

Halibutt added a comment.EditedFeb 19 2015, 5:58 AM

@Halibutt @tarlocesilion Thank you. This is very helpful. Unfortunately, Polish is not supported through Apertium presently and due to this the benefit of machine translation will be lacking for Polish users. However, the tool is useful without MT as well and if it helps for you to assess how good/bad the usability is in its current state, we will be happy to set up Polish on the beta environment for testing.

Thanks, we'll be happy to test it. I admit I have no experience with Apertium whatsoever, but their wiki lists some interesting options as "staging pairs", for instance Polish <=> Czech, Polish <=> Ukrainian and some more (even Cassubian and Sorbian!). Are they of any use?
Cheers

Thanks, we'll be happy to test it. I admit I have no experience with Apertium whatsoever, but their wiki lists some interesting options as "staging pairs", for instance Polish <=> Czech, Polish <=> Ukrainian and some more (even Cassubian and Sorbian!). Are they of any use?

Thank you. We are currently not considering the staging pairs, as package reliability can be an issue. Only the stable language pairs are being included (i.e. into the Wikimedia infrastructure).

I will pass on the request to set up Polish in the testing server. Would you like to recommend the languages that can be marked as 'preferred source languages'? This requirement for source languages will go away soon and users will have more flexibility to choose any article in any language to translate from. We will keep you posted on when that happens. Thanks.

Arrbee edited projects, added LE-Sprint-83; removed LE-Sprint-82.Feb 24 2015, 7:34 AM
Arrbee moved this task from Backlog to In Progress on the LE-Sprint-83 board.Feb 24 2015, 7:37 AM

Hi guys. In az-wiki we know that machine translation to Azerbaijani from
other languages gives very bad results. Turkish is the closest language to
Azerbaijani, but MT gives terrible results and it is better to create a new
article than to edit, so we prefer to delete them, except the cases when
editor improves it. As a result users stopped creation of such articles and
we very rarely see such articles.

But I think that the tool itself can be very helpful and can make easier
for newcomers to create articles, and not only for them. So I want to ask
developers to add this tool to Beta features of az-wiki if possible. Thanks
in advance.

Arrbee edited a custom field.Feb 25 2015, 5:48 AM
Arrbee added a comment.Mar 2 2015, 6:44 AM

But I think that the tool itself can be very helpful and can make easier
for newcomers to create articles, and not only for them. So I want to ask
developers to add this tool to Beta features of az-wiki if possible. Thanks
in advance.

Now tracked at T91230 . Thanks.

Arrbee edited projects, added LE-Sprint-84; removed LE-Sprint-83.Mar 10 2015, 9:26 AM
Arrbee moved this task from Backlog to In Progress on the LE-Sprint-84 board.Mar 11 2015, 8:37 AM
Arrbee edited a custom field.
Pginer-WMF edited a custom field.Apr 1 2015, 9:18 AM
Arrbee moved this task from Backlog to In Progress on the LE-Sprint-85 board.Apr 2 2015, 7:19 AM
Arrbee moved this task from Long term to CX5 on the ContentTranslation board.Apr 20 2015, 6:08 AM
Arrbee edited projects, added LE-Sprint-86; removed LE-Sprint-85.Apr 21 2015, 7:26 AM
Arrbee moved this task from Backlog to In Progress on the LE-Sprint-86 board.Apr 21 2015, 7:30 AM
Arrbee edited projects, added LE-Sprint-87; removed LE-Sprint-86.May 25 2015, 7:52 AM
Arrbee moved this task from Backlog to In Progress on the LE-Sprint-87 board.May 25 2015, 8:13 AM
KartikMistry closed this task as Resolved.May 29 2015, 4:36 AM
KartikMistry moved this task from In Progress to Done on the LE-Sprint-87 board.