Page MenuHomePhabricator

Better announce the lack of Machine Translation
Closed, DeclinedPublic

Description

Users have the expectation of Machine Translation to be available everywhere. However, translation services do not support all languages.

Currently the lack of Machine Translation is shown in the paragraph card, but we may to highlight it is some way the first time to make it clearer, or even anticipate the information at the new translation dialog (e.g., as a warning or highlighting the languages that have MT in the selector).

Event Timeline

Pginer-WMF raised the priority of this task from to Medium.
Pginer-WMF updated the task description. (Show Details)
Pginer-WMF raised the priority of this task from Medium to Needs Triage.Jul 21 2015, 9:24 PM
Amire80 triaged this task as Medium priority.Jul 28 2015, 6:37 PM
Amire80 moved this task from Needs Triage to CX6 on the ContentTranslation board.

There are several ways of showing possible language pairs: One is via a table, andother is by listing pairs (e.g. ordered according to target language) rather than by listing singleton languages. So, the following list could be a good idea:
Target language: source language 1, source language 2, etc.
...
Philosophy: I can understand more languages than I can write text for. I thus decide on the target language and investigate what pairs there are. Clicking on e.g. "source language 2" above will give the pairs "source language 2 -> target language"

There are several cases of lack:

  1. Machine translation doesn't exist at all. There's no known technology that provides MT for a pair of languages. Example: Inuktitut - Belarusian.
  2. Some machine translation exists, but we don't support it at the moment. Example: English - Punjabi using Google Translate.
  3. We support it, but the service is currently broken for some reason. Example: Spanish - Galician using Apertium (but the Apertium service crashed).

Cases 1 and 2 are similar, and can be handled together, and that's what T90243 is about.

Case 3 is definitely separate, and needs a separate message, something "Machine translation is currently unavailable. Please try later."

I'm not sure where to put it, however - the dropdown on the MT card? Or some place else?

This is a setup for Apertium, so we are restricted to Apertium language pair (and in general, I assume, to open source solutions). Statistical MT is in any case not that good for language production. Language pairs not in Apertium may be added, though. If an apertium pair is broken, the intermediate answer is to roll back the svn to the last working version. But since the source code is open, the best way is probably just to get a password, go in and participate in the work.

The setup may be used without MT, though, as it gives links, formatting and pictures. I myself use it for MT only, and I have earlier suggersted that the interface lists the available language pairs (there are not that many of them).

As for what pairs to hope for, my suggestion is to utilize the fact that most languages are not isolates, and thus go for a larger sibling to find articles there:

  • de -> nl (and to all minority West Germanic languages)
  • es -> ca, gl, ar, pt, etc.
  • different directions between the romance lgs
  • different directions between the North Germanic (Scandinavian) lgs
  • Finnish to other Baltic Finnic languages
  • Russian to other Slavic languages,
  • etc, etc.

English is a popluar L1, but it is tricky, since lexical disambiguation is hard (with no languages to borrow from English tend to reuse its own words, with polysemy as a result), and since going from a poor to a rich morphology is usually not easy.

One possibility for the interface could be to list "No translation" as an option, and then list all and only the existing language pairs in addition, in order not to dissappoint users clicking all types of impossible combinations, in vain.

This is the list of Apertium language pairs. Pairs under "Trunk" and "Staging" should be useful. Whether the "Nursery" ones will be useful or not is in itself an interesting question, I would like to give it a go, but I also fear Wikpedia "language collectors" use this as a way to add better minority language content than they would have been able to do without it (and hence publish without any proofreading).

Some options:

a) Target language selector(ULS) shows an annotation along with language names. Having a [MT Avialable] annotation means, the source language -> target language MT is available.

pasted_file (336×700 px, 44 KB)

b) The translation selector can display the status of MT availability above the license field

pasted_file (317×699 px, 45 KB)

https://cxserver.wikimedia.org/v1/list/mt gives the required information for this

This comment was removed by Trondtr.

With the integration of new translation services supporting many more languages, and better highlighting the cases where it is not available (T209606), the need for additional mechanisms seems less relevant.