Build a Machine translation engine using manual/semi manual translations of wikipedia translations
What it does:
- Learns from Wikipedia translations across languages and build translation models from parallel corpora for Machine translation.
Wiki thing it helps with:
- Content translation project https://www.mediawiki.org/wiki/Content_translation
- Stephen: it would be neat if AI could automatically identifical words (within context) previously translated. For example, the translation of the word "Wikimedia" once made is probably identical regardless of context. The translation of the word "build" is not AI could identify identical context usages. This is all done manually in TranslateWiki by volunteers right now
- TWN has translation memory based on previous translations, but it can be enhanced may be by using Parallel corpora from translations
- Niklas: perhaps this AI could enhance existing MT by local translated expression selection
Things that might helps us get this AI built:
- Parallel corpora dumps produced by Content translation https://www.mediawiki.org/wiki/Content_translation/Published_translations (Production dump: https://dumps.wikimedia.org/other/contenttranslation/)