Type of activity: Pre-scheduled session
Main topic: Artificial Intelligence to build and navigate content
The problem
We would like to make all our content available in all the languages of the world to read all the readers.
Machine Translation capabilities are greatly improving. Further, we potentially possess the rosetta stone for building excellent machine translation tools: a large and continually growing archive of parallel texts.
A virtuous cycle is possible, where every article or part of an article which is translated by a user trains our machine translation software to make better automatic translations and suggestions, so that the rote work of updating translation A when translation B has been edited is increasingly automated.
This is a hard problem, in general, and it's not entirely clear whether WMF should build the machine translation expertise (or software) in house. But there are a number of initial steps we could take in the near-ish term to pave the way.
Expected outcome
Consensus on a vision for machine translation as an integral part of our projects in the future.
Consensus on practical initial steps to take.
Current status of the discussion
- This was discussed as a potential Main Topic for WikiDev17
- Amir and I discussed this during the October Editing offsite, but there was no formal session.
- Our proposed first steps (perhaps a stepping-off point for this session):
- Export CX translation pairs in appropriate format for Moses training data
- Add part of speech info to wikidata interlanguagelink relations
- Export interlanguagelinks in appropriate format for Apertium dictionary.
Links
- Moses open source *statistical* translation tool
- Apertium open source *rule/dictionary based* translation tool
- Google's current statistical translation framework: https://research.googleblog.com/2016/09/a-neural-network-for-machine.html
- Using wikidata for interlanguage links: https://meta.wikimedia.org/wiki/A_newer_look_at_the_interlanguage_link