Page MenuHomePhabricator

Support automatic translation of MediaWiki talk page comments
Open, Needs TriagePublic

Description

The Movement Strategy Forum has a very cool automatic comment translation feature (via Google Translate at the moment, although this is configurable) which works wonders for multilingual discussions. It would be great to have something like that for multilingual wikis like Meta, Commons or Wikidata.

This would require several pieces of functionality:

  • translation API handling
    • managing translation API details (API selection for a given language, keys, language fallback chains etc)
    • proxying translation API requests to protect user privacy and to deduplicate
    • monitoring of API errors
  • identify pieces of talk page text (comments) to be translated (DiscussionTools provides an API for this, I think)
    • deal with transcluded comments (DiscussionTools has some sort of stable ID for comments, maybe that handles this?)
  • store translations so the same text doesn't get translated over and over as new people read it
    • deal with invalidation when the page is edited, especially when comments are moved during archival (again, maybe DiscussionTools comment IDs can help?)

Event Timeline

From yesterday in #wikimedia-tech:

16:52:13 <legoktm> related to the discussion about automatic translations in Discourse vs MediaWiki and privacy implications of using Google Translate (or other external providers) - https://blog.mozilla.org/en/mozilla/local-translation-add-on-project-bergamot/ came out today, fully local translations
16:52:39 <AntiComposite> that's cool
16:57:16 <legoktm> it's mostly webassembly, would be interesting if it could be wired into a gadget that lets you translate individual talk page comments or sections

I have been testing the SectionTranslation tool (is only available for some languages) and is doing a great job translating sections and sentences, and even providing a way to store them, treat them and "officialy translate" the correcting the MT. So this seems a good path towards this feature.

Google Translate supports 109 languages. Project Bergamot apparently supports 14 (4 of which are still in beta), almost all first-world languages. (The quality seems pretty impressive though - worse than Google, but not by much.) So it doesn't seem very useful right now.

In any case privacy is a trivial matter of proxying through Wikimedia (or Wikimedia-affiliated) servers, it's the least challenging aspect of translating MediaWiki comments.

I can't edit the doc, but we are using Elia at Wikimedia for Basque, Catalan, Galician, Spanish, French and English languages.

Assuming that comments can be in any language, we will also need to identify the language of the comment first.

While this seems like a good thing to integrate into MediaWiki, it is also something that is supported on the browser side. I regularly use Google Chrome to translate Spanish documents to English through Google Translate. If we want to do this, how could we do it *better* than Google Translate does?

While this seems like a good thing to integrate into MediaWiki, it is also something that is supported on the browser side.

That's an extremely fair point. For any new feature which ostensibly contributes to already substantial tech debt, I think it is wise to ask if there already exists an easy-to-use, free-ish solution (this would qualify for consideration IMO) instead of attempting to reinvent the wheel with the Foundation's and Community's limited resources.