Explore the integration of OpusMT
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Pginer-WMF
	Sep 30 2019, 10:31 AM

Description

MarianMT is an open source Neural Machine Translation framework and the the OPUS project is building language models based on their corpora for MarianMT. The OPUS project compiles a parallel corpus of translation examples, including those created by using Content translation.

The integration of this project into Content translation (and other Wikimedia projects) would provide new opportunities to expand the use of machine translation to new languages and new usecases. This would be the first approach that is both opensource and based on neural machine translation, making it different from the existing options. This makes it possible, for example, to integrate user corrections done with Content translation back into the system to improve the translation quality.

The current ticket proposes to explore the possibility of such integration by defining the initial steps to follow, including technical aspects to evaluate among other considerations.

Details

Subject	Repo	Branch	Lines +/-
Update cxserver to 2020-02-13-162638-production	operations/deployment-charts	master	+3 -3
Add config for OpusMT	operations/deployment-charts	master	+9 -0
Add OpusMT machine translation client	mediawiki/services/cxserver	master	+55 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T76456 Language Engineering tracker of trackers (tracking)
Open	None	T76454 [Epic] ContentTranslation - MT support
Open	None	T86700 Add more machine translation services (tracking)
Resolved	santhosh	T234194 Explore the integration of OpusMT
Resolved	• Bstorm	T237354 "bigram" instance for Language team

Event Timeline

Pginer-WMF created this task.Sep 30 2019, 10:31 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 30 2019, 10:31 AM

Pginer-WMF triaged this task as Medium priority.Sep 30 2019, 10:32 AM

Pginer-WMF added a parent task: T86700: Add more machine translation services (tracking).Sep 30 2019, 10:35 AM

KartikMistry subscribed.Sep 30 2019, 1:54 PM

santhosh updated the task description. (Show Details)Oct 1 2019, 3:54 AM

Pginer-WMF moved this task from Backlog to Maintenance backlog on the Language-Team (Language-2019-October-December) board.Oct 9 2019, 2:06 PM

Pginer-WMF moved this task from Maintenance backlog to Priority backlog on the Language-Team (Language-2019-October-December) board.Oct 10 2019, 9:13 AM

As a start, I attempted to make the development and deployment of the system simple. I creaded a Docker container that takes care of setting up the system - https://github.com/Helsinki-NLP/Opus-MT/pull/1
It need more work. My aim is to get a webserver that abstract MT engines and complexities, and expose a web api that does translation

santhosh claimed this task.Oct 21 2019, 11:24 AM

santhosh moved this task from Priority backlog to In Progress on the Language-Team (Language-2019-October-December) board.

Today I built a web fontend for Marian MT with OPUS data models:

Uses Tornado for forking sub processes and communicate with them
Define a configuration for language pairs and their models
Build a simple web interface where language pair can be selected and content to translate can be submitted
Define a web api at /api url which takes from, to, source params- in the body content to POST. Returns json with translation as key for translated content.
Remove all existing python scripts and replace it with server.py

I submitted a pull request at https://github.com/Helsinki-NLP/Opus-MT/pull/2 but not ready to merge yet. I need to do some testing of docker images.

Here is a screenshot of web interface with en-es translation

KartikMistry mentioned this in T237354: "bigram" instance for Language team.Nov 5 2019, 6:48 AM

Pginer-WMF moved this task from Needs Triage to Enhancements on the ContentTranslation board.Nov 11 2019, 8:24 AM

This is running at http://opusmt.wmflabs.org/

Observations

The prepared language models used from https://object.pouta.csc.fi/OPUS-MT gives translation, but results need lot of improvement for start using in our usecases. Need to work with upstream to see what can be done
The python tornado based web interface and API is not merged with upsteam. The version used for http://opusmt.wmflabs.org/ is https://github.com/santhoshtr/Opus-MT
For all Indic languages, this issue is present: https://github.com/Helsinki-NLP/Opus-MT/issues/4

Based on the OpusMT documentation, it seems that there is support for a few languages that are not supported by other MT systems available: Assamese (as), Breton (br), Kinyarwanda (rw), and Walloon (wa).

This may be a useful list for future initiatives. They may be willing to try an experimental MT system even if the initial quality is very low (compared to having no MT), and be willing to spend time with manual corrections to help to make it better.

Pginer-WMF added a subtask: T237354: "bigram" instance for Language team.Nov 15 2019, 11:00 AM

Pginer-WMF mentioned this in T239697: Expose translated messages from the Translate extension in a parallel corpora.Dec 3 2019, 11:03 AM

After the language team offsite conversations, the next steps would be as follows:

Expand our OpusMT instance language models with the unsupported languages: Assamese (as), Breton (br), Kinyarwanda (rw), and Walloon (wa).
Create a MT client for Content translation to support the integration (labelled as experimental).
Enabled the new MT service on testing wikis but keep disabled on real wikis (until conversations with the communities show their interest in trying).

Pginer-WMF edited projects, added Language-Team (Language-2020-January-March); removed Language-Team (Language-2019-October-December).Jan 2 2020, 8:12 AM

Pginer-WMF moved this task from Backlog to In Progress on the Language-Team (Language-2020-January-March) board.

santhosh renamed this task from Explore the integration of MarianMT to Explore the integration of OpusMT.Jan 3 2020, 10:36 AM

Change 561813 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/services/cxserver@master] Add OpusMT machine translation client

https://gerrit.wikimedia.org/r/561813

gerritbot added a project: Patch-For-Review.Jan 3 2020, 10:37 AM

Change 563110 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/deployment-charts@master] WIP: Add config for OpusMT

https://gerrit.wikimedia.org/r/563110

Assamese seems a good candidate language, and a proposal to enable OpusMT was shared with the Assamese Wikipedia community.

Pginer-WMF mentioned this in T149666: Next steps for machine translation.Jan 20 2020, 12:02 PM

Change 561813 merged by jenkins-bot:
[mediawiki/services/cxserver@master] Add OpusMT machine translation client

https://gerrit.wikimedia.org/r/561813

Change 563110 merged by KartikMistry:
[operations/deployment-charts@master] Add config for OpusMT

https://gerrit.wikimedia.org/r/563110

KartikMistry mentioned this in rDEPLOYCHARTS60d490bb3985: Add config for OpusMT.Feb 18 2020, 11:40 AM

Change 572841 had a related patch set uploaded (by KartikMistry; owner: KartikMistry):
[operations/deployment-charts@master] Update cxserver to 2020-02-13-162638-production

https://gerrit.wikimedia.org/r/572841

Change 572841 merged by jenkins-bot:
[operations/deployment-charts@master] Update cxserver to 2020-02-13-162638-production

https://gerrit.wikimedia.org/r/572841

KartikMistry mentioned this in rDEPLOYCHARTSee2ac5deb6e8: Update cxserver to 2020-02-13-162638-production.Feb 18 2020, 11:46 AM

en->as is enabled for OpusMT and ready to use at Production.

Pginer-WMF mentioned this in T245509: Adjust the threshold for Assamese to prevent publishing when overall unmodified content is higher than 70%.Feb 18 2020, 1:32 PM

santhosh closed this task as Resolved.Feb 19 2020, 8:51 AM

santhosh moved this task from In Progress to Done on the Language-Team (Language-2020-January-March) board.

Message posted on Asamese Wikipedia to confirm the availability of the new system.

Pginer-WMF mentioned this in T262192: Improve MT support for Tsonga with OpusMT.Sep 7 2020, 8:40 AM

Pginer-WMF mentioned this in T262253: Improve MT support for Central Bikol with OpusMT.Sep 8 2020, 10:20 AM