Page MenuHomePhabricator

Display error when an engine does not support the requested language
Closed, ResolvedPublic5 Estimated Story Points

Description

Some languages (e.g. Welsh, cy) are not a supported lang parameter by certain Phonos engines — currently, Phonos will fail to render any audio when an unsupported language is passed.

In these cases, we should show an error message, and help the user choose a supported language. We did think about falling back to another "parent" language, but decided against that.

E.g.

<phonos text="llandudno" ipa="/ɬanˈdɨdnoː/" lang="cy" />

will return an error message and render no audio, whereas

<phonos text="llandudno" ipa="/ɬanˈdɨdnoː/" lang="en" />

returns an audio rendering.

Further examples at https://en.wikipedia.beta.wmflabs.org/wiki/Phonos

Acceptance criteria

  • When provided with an unsupported language, Phonos will show an error message (and not use the already established language fallback path).
  • The error message will, if possible, suggest alternative (supported) language codes.
  • When provided with a supported language, Phonos will send the request unchanged.
  • The user-provided language code will be normalized as appropriate (e.g. if PT_BR is given and pt-BR is supported then no error will be shown).

Event Timeline

MusikAnimal subscribed.

Each engine has their own set of supported languages. Would it make sense to maintain a hard-coded list of each within Phonos to ensure we fallback to one that works? It feels hacky but without an API endpoint to give us the supported languages, I suppose this is something we'll have to manually maintain. Google in particular seems to get very confused if you don't give it the right language.

Another thought is to only release to language edition wikis that we know are supported.

Each engine has their own set of supported languages. Would it make sense to maintain a hard-coded list of each within Phonos to ensure we fallback to one that works? It feels hacky but without an API endpoint to give us the supported languages, I suppose this is something we'll have to manually maintain. Google in particular seems to get very confused if you don't give it the right language.

I was thinking a hardcoded list of known working languages, and then maybe using language objects to work backwards until one is matched?

Another thought is to only release to language edition wikis that we know are supported.

Good point....

Samwilson subscribed.

Do we want cy to fallback to en? That might result in audio being rendered, but it feels a bit hacky. If we use MediaWiki's fallback rules, then it wouldn't fallback like that.

Each engine has their own set of supported languages. Would it make sense to maintain a hard-coded list of each within Phonos to ensure we fallback to one that works? It feels hacky but without an API endpoint to give us the supported languages, I suppose this is something we'll have to manually maintain. Google in particular seems to get very confused if you don't give it the right language.

It looks like Google has the https://cloud.google.com/text-to-speech/docs/reference/rest/v1/voices/list voices.list endpoint, which can return this info. We could cache it for a month or something.

Not sure about the other engines, but perhaps if we add EngineInterface::getSupportedLanguages():?array then if it returns null we can assume it will take any language.

Do we want to alert the user that a fallback language has been used?

Change 854854 had a related patch set uploaded (by Samwilson; author: Samwilson):

[mediawiki/extensions/Phonos@master] Fall back to supported language if possible

https://gerrit.wikimedia.org/r/854854

I was wrong above about not falling back from cy to en. MediaWiki always ends up at en, for any language. Is that what we want to do though?

If no fallback exists, or the "parent" language is also not supported, Phonos will show an error specifying the language is not supported.

This won't happen if we always end up at English. I think we could probably just always fall back, and never have to display an error.

As for displaying a notice that the fallback has occured: I'm not sure of the best way to do that. Currently, the popup only appears when clicking on a phonos button where there's no other action that can be taken, but if a fallback has happened then there should still be audio to play and so the click should do that and not also open a popup.

I was wrong above about not falling back from cy to en. MediaWiki always ends up at en, for any language. Is that what we want to do though?

If no fallback exists, or the "parent" language is also not supported, Phonos will show an error specifying the language is not supported.

This won't happen if we always end up at English. I think we could probably just always fall back, and never have to display an error.

As for displaying a notice that the fallback has occured: I'm not sure of the best way to do that. Currently, the popup only appears when clicking on a phonos button where there's no other action that can be taken, but if a fallback has happened then there should still be audio to play and so the click should do that and not also open a popup.

I concur, it probably isn't safe to fallback to English. Maybe Google does a decent job of just "figuring it out", but I suspect if that's true at all it doesn't apply to all languages we might see in use in production.

I think for now, since we're still a ways from a widespread deployment, it's nice to have the error if the parent or no fallback exists. That'll help surface pronunciations that are more likely to be wrong. Longer-term I don't think it would be good, especially at Wiktionaries that seemingly have a lot of arbitrary translations. We might see a lot of these errors, and the gray box looks kind of off-putting.

A few ideas:

  • Make the button look like a working button, but actually show the error on-click. Possibly also hide the speaker icon.
  • Put the page in a dedicated maintenance category, like 'Category:Pages using Phonos with an unsupported language'

I was wrong above about not falling back from cy to en. MediaWiki always ends up at en, for any language. Is that what we want to do though?

If no fallback exists, or the "parent" language is also not supported, Phonos will show an error specifying the language is not supported.

This won't happen if we always end up at English. I think we could probably just always fall back, and never have to display an error.

As for displaying a notice that the fallback has occured: I'm not sure of the best way to do that. Currently, the popup only appears when clicking on a phonos button where there's no other action that can be taken, but if a fallback has happened then there should still be audio to play and so the click should do that and not also open a popup.

I concur, it probably isn't safe to fallback to English. Maybe Google does a decent job of just "figuring it out", but I suspect if that's true at all it doesn't apply to all languages we might see in use in production.

What about a language like "en-au" or "en-ca"?

E.g.

<phonos text="llandudno" ipa="/ɬanˈdɨdnoː/" lang="cy" />

will return an error message and render no audio, whereas falling back to

<phonos text="llandudno" ipa="/ɬanˈdɨdnoː/" lang="en" />

returns an audio rendering.

I'm having a hard time fathoming how anyone could think this is remotely a good idea. There's no /ɬ/ or /ɨ/ in English so even if the TTS could interpret it, the result would be 100% wrong. Languages differ. That's the whole point of providing an IPA in the first place.

The example here indeed sounds nothing like it does in Welsh.

What about a language like "en-au" or "en-ca"?

In a similar vein to T323912, my suggestion is "Don't hard-code anything." en-us may be a good approximation to en-ca in most contexts, but whenever someone is talking about Canadian English, it's quite likely it's about something that specifically doesn't apply to American English.

The more I think about it, the only fallbacks that would make sense are common ISO 3166 codes for countries that differ from the ISO 639 code for the language predominantly spoken in each country provided it's not already taken by another language in ISO 639. Namely cz, dk, and jp (but not kr/se/tr etc. because they're ISO 639 codes for languages other than Korean/Swedish/Turkish). In fact interlinks like [[:cz:]] and cz.wikipedia.org etc. redirect to cs.wikpedia.org (see interwiki.php and redirects.dat).

Other than that, never fall back on anything. Not even xx to xx-YY unless there's only one country to choose from.† Just think how much of a faux pas it would be to (implicitly) declare American or British English is THE English, European or American Spanish is THE Spanish, or Mainland or Taiwanese Mandarin is THE Mandarin.

Declaring French French as THE French and equating Tagalog (tg) to Filipino probably wouldn't ruffle too many feathers, but I wouldn't count on it.

Equating nb and no is probably alright (as Bokmål vs Nynorsk is mainly a difference in writing), so long as the engine accepts only one.

† Even this is tricky. If only one of pt-BR and pt-PT became available, you certainly wouldn't want pt to fall back on it.

Perhaps the safest first step here is to add an error message shown when the language is not supported (perhaps also showing 'likely' available langauges, e.g. for unsupported pt-PT it could say "pt-BR and pt are supported" or something).

One issue is that the lang parameter is somewhat doing double duty: for TTS engines it's more a 'voice' (e.g. en-uk-north can be valid for espeak, and Google has voices per language but we're not exposing any way to choose them at the moment), but then we also want to use it to look up pronunciation audio and IPA transcriptions on Wikidata (where it's treated as a IETF language code).

Also, lang default to the wiki's content language, so on a wiki of an unsupported language it'll always need to be set. (Which is fine, but should be made clear to the user.)

So I suggest we change this task to be about a) retrieving lists of supported languages for the different engines, and b) telling the user when they use an unsupported language. And only do the fallback as a way to display possible related language codes to the user. Does that sound right?

only do the fallback as a way to display possible related language codes to the user

Like a "Did you mean:" rather than silently correcting it. I like it.

Like a "Did you mean:" rather than silently correcting it. I like it.

Ok cool, I'll get something ready. The message would be something like:

Language 'fr' is not supported by Phonos. The following possibly related languages are supported: fr-be, fr-fr (see [the documentation] for a full list).

And the list of languages would be found by a) the fallback chain if the given one is a valid MediaWiki language, and b) a leading substring search (i.e. in the example above 'fr' appears at the start of 'fr-be' and 'fr-fr'). Does that sound okay? Or maybe it's better to just give a link to the docs where all languages can be listed?

Actually, this is making me wonder about the whole matter of the lang parameter in general: wouldn't it be better to offer up all the available voices? Google only supports 54 language codes, but 398 voices. Espeak similarly has voices that are not represented by language codes. It feels to me like it might be nice to allow voice="en-AU-Standard-A" or voice="en-gb-scotland" for different articles (much the same way that different date formats are permitted within articles about those regions).

We still need lang of course for looking up Wikidata info.

These are the Google language codes:

^ array:53 [▼
  0 => "af-ZA"
  1 => "ar-XA"
  2 => "bg-BG"
  3 => "bn-IN"
  4 => "ca-ES"
  5 => "cmn-CN"
  6 => "cmn-TW"
  7 => "cs-CZ"
  8 => "da-DK"
  9 => "de-DE"
  10 => "el-GR"
  11 => "en-AU"
  12 => "en-GB"
  13 => "en-IN"
  14 => "en-US"
  15 => "es-ES"
  16 => "es-US"
  17 => "fi-FI"
  18 => "fil-PH"
  19 => "fr-CA"
  20 => "fr-FR"
  21 => "gu-IN"
  22 => "hi-IN"
  23 => "hu-HU"
  24 => "id-ID"
  25 => "is-IS"
  26 => "it-IT"
  27 => "ja-JP"
  28 => "kn-IN"
  29 => "ko-KR"
  30 => "lv-LV"
  31 => "ml-IN"
  32 => "mr-IN"
  33 => "ms-MY"
  34 => "nb-NO"
  35 => "nl-BE"
  36 => "nl-NL"
  37 => "pa-IN"
  38 => "pl-PL"
  39 => "pt-BR"
  40 => "pt-PT"
  41 => "ro-RO"
  42 => "ru-RU"
  43 => "sk-SK"
  44 => "sr-RS"
  45 => "sv-SE"
  46 => "ta-IN"
  47 => "te-IN"
  48 => "th-TH"
  49 => "tr-TR"
  50 => "uk-UA"
  51 => "vi-VN"
  52 => "yue-HK"
]

Google only supports 54 language codes, but 398 voices.

According to the documentaiton, only 21 of those 54 support the <phoneme> tag (and only 18 support IPA). But at the same time, if supporting all of them comes at no extra cost, why not? Many languages have phonemic (aka "shallow") orthographies, where pronunciation can be reliably deduced from writing, so e.g. Hungarian "Magyar" is just as reliable as IPA [ˈmɒɟɒr].

Then Phonos can be just a middleman that more or less indiscriminately accepts any input along with specification of language/voice and input format (which can be orthography, IPA, X-SAMPA, Pinyin, Jyutping, or anything), see if the engine supports said specification, if so passes it to the engine, and then passes along the audio from the engine to the reader if received.

That would of course make this extension effectively about generic text-to-speech rather than IPA-to-audio, likely inviting wider applications than you intended, but 21 (or 18) does seem somewhat too small a selection for as global a movement as Wikimedia.

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Phonos/+/854854 r854854 is ready for review. It adds a system for fetching the supported languages, and error messages for when there's no match. The matching is slightly normalized: case-insensitive and it swaps underscores for hyphens (so e.g. en_au matches the supported en-AU). If the provided lang is a substring of some supported languages, those are listed in the error popup to let the user know that they are available.

I'm not sure if we want to make the full list of supported languages available somewhere, e.g. on [[Special:Phonos]].

@Nardog I think we should definitely support whatever the engine supports. I think that'll mean adding extra parameters, so that |lang= is kept for language codes. Although, come to think of it, that doesn't really work already because e.g. Espeak supports a lang of 'en-uk-north'. But it does call it a 'language', so maybe it's okay. Google is more correct and only has actual language codes.

I'll make a separate ticket for looking at supporting a |voice= parameter. I think there are now tasks already for other parameters e.g. |xsampa= (T324111).

Samwilson renamed this task from Fallback language where an engine does not support the requested language to Display error when an engine does not support the requested language.Dec 8 2022, 6:38 AM

@Nardog I think we should definitely support whatever the engine supports.

That means Pinyin and Jyutping are supported and https://gerrit.wikimedia.org/r/c/864885 is reversed, correct?

I think that'll mean adding extra parameters, so that |lang= is kept for language codes. Although, come to think of it, that doesn't really work already because e.g. Espeak supports a lang of 'en-uk-north'. But it does call it a 'language', so maybe it's okay. Google is more correct and only has actual language codes.

I disagree with this characterization. If Northern England was a sovereign state, it would just be en-xx, the same status as en-us etc. eSpeak supporting various dialects for the same language is not equivalent to Google offering various speaking voices for the same language/dialect, and treating en-uk-north under lang is "more correct" as far as I can see.

(I'm not sure where you got en-uk-north btw. The equivalent in the latest documentation appears to be en-gb-x-gbclan, following BCP47 more strictly.)

Change 854854 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Show error if lang not supported

https://gerrit.wikimedia.org/r/854854

Change 869835 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Phonos: use TimedMediaHandler to find MP3 urls for non-MP3 files

https://gerrit.wikimedia.org/r/869835

ignore T320523#8482503, I added this task to my patch by accident

Not being able to fall back to project language codes could make invoking this from project workflows such as templates, etc be more problematic to make reusable wiki-code invocations.

@Xaosflux The idea of language fallback doesn't make sense in Phonos because we're dealing with spoken language. You can't expect any transcription for one variety of a language to work for another. And you can't expect any existing transcription to work in a given TTS because there are competing IPA conventions. So you can't expect a template using Phonos on one project can be imported to another and its members will find it satisfactory. That's their call to make.

And since most uses will be via templates/modules anyway, each project can make it default to one variety. But leave that decision to each community. Otherwise you'll offend everyone (see #8431626).

I have tested the lang parameter for every language in the site matrix. Just to get an idea what users on each of these wikis will be suggested as a default (if they don't include the lang parameter). See https://en.wikipedia.beta.wmflabs.org/wiki/Phonos_Languages.

Most of the languages are not supported by Google and no suggestion is made. They just see the error message: Language zz is not supported by Phonos.

The suggested lang parameter is not always appropriate. For example, af-ZA (Afrikaans) is suggested for za (Zhuang), fil-PH (Filipino) is suggested for fi (Finnish). I found about 10 examples in total. Perhaps we should not do a substring match.

We may need to warn users that the naming conventions of language codes Google uses might not match what they are used to from the wikis. For example, they use cmn-CN and cmn-TW rather than zh-* for Mandarin, yue instead of zh-yue for Cantonese and nb-NO instead of no for Norwegian.

It might be a good idea to include a link to somewhere like https://cloud.google.com/text-to-speech/docs/voices so people know which languages we support, rather than having to guess.

I agree it should be a word match (excluding underscores) rather than a simple prefix match, so fil isn't suggested for fi. It might indeed be a good idea to store a table matching macrolanguages and their subordinates, so e.g. ar-XA is suggested for arb, apc, etc., though there are still other combinations worth equating, such as tg and fil. (Which would still open cans of worms, such as "Do we want to suggest Hindi for Urdu or vice versa?")