hi GSoC admins,
Apertium didnt get into GSoC this year :-(
Therefore our mentors have lotsa spare time :-| .... I was thinking if there would be a possibility to do a joint GSoC project, related to your machine translation plugins, where Apertium is currently used ?
- are there some specific Apertium language pairs which works, but not good enough, which it would make sense to improve for wikimedia ?
- some packaging and other distribution issues that are not solved - kart_ (Kartik Mistry) currently spends a lot of time doing it, and we'd love to make it more streamlined.
- Nemo_bis and Nikerabbit would like an API endpoint to use in stock Translate https://gerrit.wikimedia.org/r/#/c/188570/
- There is a lot to do with dictionaries, especially if bridging Wiktionary or Apertium or others with something that Content translation can use https://www.mediawiki.org/wiki/Content_translation/FAQ#What_dictionaries_will_be_available.3F
- There is already https://phabricator.wikimedia.org/T31229 , maybe that can be turned into something that benefits both MediaWiki and Apertium
- Magnus Manske is interested in adding machine translation to his Wikidata tool described in http://magnusmanske.de/wordpress/?p=265
For language pairs... Implementing good ways to contribute from CX (that's the code name for ContentTranslation) back to Apertium is something that Wikimedia would love to have.
https://phabricator.wikimedia.org/T91492 is one simple thing - probably not something to take up a whole GSoC project, but a microtask for applicants. :)
(copying from IRC)
Finding a nice streamlined way to get the translators to contribute translations back to the current Apertium dictionaries would be a very cool thing.
https://phabricator.wikimedia.org/T91492 is about finding the missing words, but making them actually translated is the real value.
Looking into the parallel corpora and post-editing done via Content translation would be the maximum. :)
Just a simplistic thought, CX could show a box for every word that Apertium fails to translate, and the translator could fill it.
Or better, the software could just grab this word straight from the translation. Collecting the sentences that end-users make changes to would be nice, so that Apertium workers can see input/MT/correction easily.
The project could be: build the "inbox" for such reports. @KartikMistry's current OPW project is kinda something like an inbox for new words to add dictionaries, but it's for spelling dictionaries, rather than translation. But maybe it could be adapted.
(<TinoDidriksen>): It's not that complex - just collect and align. It's made much easier since you basically have sentence alignment given by the MT output. Users don't need to do anything. It should be automatic, based on what edits they make.
(proposed by jacobEo from #apertium. http://www.dtu.dk/Service/Telefonbog/Person?id=78778&tab=6 - feed free to edit!)