Context
As we are going to scale our tools to more wikis, we are blocked in the translation process. On the contrary of other tools, the Growth tools aren't visible for experienced users (they target newcomers), hence they are not really getting translations.
The only way to get translations (with no guarantee to reach an 100% achievement) is to create an active community discussion. This can't scale to all wikis since we don't have the workforce to encourage all wikis individually. And not all communities reply to these calls.
We think it is better for newcomers to have an interface in their language, even if provided by a machine translation service; better than to have English.
Using machine translation doesn't mean that we skip the announcement to communities about a forthcoming deployment. They will be informed about the deployment weeks before it, with an opportunity to work on genuine translations. We expect this potential use of machine translation to encourage communities to work on translations, or, at least to fix them quickly. This machine translation process is a backup process, in case of an absence of translations being made ahead of time.
Task
This task to check the feasibility of the following:
- extract all strings needing translation (skip the ones already translated)
- find a way to translate them en masse
- check if existing translations from translatewiki, used in the same context, can be used instead of machine translation
- import then to translatewiki.net, with a tag "to be checked" (if exists)
Check how ContentTranslation handles wikitext, and if there is a API for it.
Ways to pursue this task
On-the-fly machine translation
Instead of importing messages to translatewiki.net, we might expore the possibility of translating messages on-the-fly. I briefly looked into the MediaWiki ways of doing that. I did not look into possible sources of machine translations, because I think that's out of scope for this task. The MessageCache core service is responsible for getting the message text. It has a function called getMessageFromFallbackChain, which is used to make fallback languages a thing. It should be easy to add a hook into it to allow extensions to define fallback messages on-the-fly, which would make on-the-fly machine translations easy (provided we actually have a translation service that has an API we can use). However, it would probably also require a caching layer over machine translations, which is probably more trouble than this is worth.
Pros:
- It won't pollute TWN repository
- It will allow us to disable this feature when a certain user preference/query parameter is present, allowing users to turn off machine translations
Cons:
- Too engineer-time expensive.
Separate message group
We could add another message group(s) to MessagesDirs, which would not go through TranslateWiki at all, and which will be loaded at the end, after all TWN-populated message groups. That would make them to act as a fallback to all human-populated (TWN-populated) groups.
Pros:
- It will not pollute TWN translation repository. Messages untranslated by humans will be still marked as untranslated, while providing a machine translation to real users as a benefit.
- It would probably allow us to remove this group when a certain user preference/query parameter is present, allowing us to disable machine translations for newcomers who speak English.
Cons:
- Not sure?
Machine translations imported into TranslateWiki.net
The TranslateWiki.net interface recognizes several categories of messages:
- Untranslated – for messages that are in TWN, but not yet translated by a translator
- Fuzzy/Outdated – for messages that were translated by a translator, but their English translation was changed after the translation was made
- Translated – for messages that were translated by a translator and are up to date
- Verified – for messages that were translated by a translator, and verified by another translator
In addition to that, messages can be flagged as optional, but that is out of scope for this task.
Translations can be added into TWN by three ways:
- Online translation – regular translation process made in the interface by translators
- Offline translation – translators specifically flagged by TWN staff as offline translators may use https://translatewiki.net/wiki/Special:ImportTranslations to import translations in gettext/po format (docs; @Urbanecm_WMF has this flag on his TWN account)
- Imported from external source – while it is not recommended, translations can be imported by directly modifying the JSON file in our codebase (ie. directly modifying cs.json, for instance).
Either using the offline translation feature or directly modifing JSON files in our codebase sounds like a good way to add translations into TWN.
There is no way how to mark a translation as "machine-translation", or even manually flag a message as untranslated. Language-Team might consider adding such a feature for us. Considering it is possible to do on-the-fly translations or introduce an extra message group, I (@Urbanecm_WMF) would personally vote for pusuing that instead.
Pros:
- Easy to do without changing any code
Cons:
- No way to automatically detect which messages are translated by humans and which are translated by a machine
- No way to disable autotranslation
- Pollutes TWN repository with machine translation
Open questions
- Should we add translations to TWN, or translate them on the fly?