Page MenuHomePhabricator

Larynx expects IPA characters to be separated by a space
Closed, ResolvedPublic2 Estimated Story Points

Description

Larynx will mispronounce a word if the IPA passed is not space-separated — for example, tomato will be incorrectly shortened if the IPA passed is təmˈɑtoʊ, but will be correctly pronounced if it's space-separated (i.e. t ə m ˈɑ t oʊ)...

Additionally, Larynx will accept a SSML non-compliant (but slightly simpler) payload, as given below:

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="{language code}">
  <lexicon>
    <lexeme>
      <grapheme>
        {word}
      </grapheme>
      <phoneme>
        {space seperated IPA}
      </phoneme>
    </lexeme>
  </lexicon>
  <w>{word}</w>
</speak>

for example:

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="en-US">
  <lexicon>
    <lexeme>
      <grapheme>
        tomato
      </grapheme>
      <phoneme>
        t ə m ˈɑ t oʊ
      </phoneme>
    </lexeme>
  </lexicon>
  <w>tomato</w>
</speak>

Action items

  • Update the Larynx engine to space-separate the IPA prior to inserting into the SSML.
  • Possibly trim down the Larynx engine SSML generation to the smaller, non-compliant example format.
  • Possibly investigate if this makes a significant difference to the quality of the returned pronunciations — this is really only useful for T317274: Use free software implementation for Phonos on Wikimedia sites

Event Timeline

Change 832242 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/Phonos@master] [WIP] LarynxEngine.php: Update Larynx SSML

https://gerrit.wikimedia.org/r/832242

MusikAnimal subscribed.

Moving to the sprint since this is actively in development.

Change 832242 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] LarynxEngine.php: Update Larynx SSML

https://gerrit.wikimedia.org/r/832242

QA notes: Larnyx always requires the text= parameter to be set, and it should match whatever word the IPA is describing. I don't know how much the newly added space-separation will change things, if at all. For now, it's not hugely important since we're going with Google for the initial launch.

For your convenience here is an up-to-date Patch Demo with the corpus: https://patchdemo.wmflabs.org/wikis/63cd762b6b/wiki/Phonos (FYI -- all Patch Demos that include the Phonos extension automatically have this "Phonos" page already set up!)

For your convenience here is an up-to-date Patch Demo with the corpus: https://patchdemo.wmflabs.org/wikis/63cd762b6b/wiki/Phonos (FYI -- all Patch Demos that include the Phonos extension automatically have this "Phonos" page already set up!)

Most of these don't sound right™, but from memory it feels like more do than before?

TheresNoTime set the point value for this task to 2.Sep 21 2022, 2:31 PM

@TheresNoTime We tested in patch demo https://patchdemo.wmflabs.org/wikis/659c2d5bfa/wiki/Phonos (Change-Id: I87fbb6b10c4e0aa287700bf74a41849b70869500) which is based from 9/12 vs one with https://patchdemo.wmflabs.org/wikis/51183b9788/wiki/Phonos (Change-Id: Ia6170253ba1db5e53d248667f54f53928a704e81) which is from 9/20 and they both still sounds the same using your tomato example. I thought the old version would sound different. If I tested this wrong, please let me know the steps to recreate your error. Thanks!

T317431_Larynx expects IPA Space_FAIL.png (520×1 px, 64 KB)

@TheresNoTime We tested in patch demo https://patchdemo.wmflabs.org/wikis/659c2d5bfa/wiki/Phonos (Change-Id: I87fbb6b10c4e0aa287700bf74a41849b70869500) which is based from 9/12 vs one with https://patchdemo.wmflabs.org/wikis/51183b9788/wiki/Phonos (Change-Id: Ia6170253ba1db5e53d248667f54f53928a704e81) which is from 9/20 and they both still sounds the same using your tomato example. I thought the old version would sound different. If I tested this wrong, please let me know the steps to recreate your error. Thanks!

T317431_Larynx expects IPA Space_FAIL.png (520×1 px, 64 KB)

Thanks for this @GMikesell-WMF :) that's a confusing outcome..... I've even tested with the text= set to foo (to make sure it wasn't just reading the word)

I'll have a think!

Ok sounds good. If you find out anything or how to retest it, please let us know. Good luck!

@TheresNoTime We tested in patch demo https://patchdemo.wmflabs.org/wikis/659c2d5bfa/wiki/Phonos (Change-Id: I87fbb6b10c4e0aa287700bf74a41849b70869500) which is based from 9/12 vs one with https://patchdemo.wmflabs.org/wikis/51183b9788/wiki/Phonos (Change-Id: Ia6170253ba1db5e53d248667f54f53928a704e81) which is from 9/20 and they both still sounds the same using your tomato example. I thought the old version would sound different. If I tested this wrong, please let me know the steps to recreate your error. Thanks!

T317431_Larynx expects IPA Space_FAIL.png (520×1 px, 64 KB)

Thanks for this @GMikesell-WMF :) that's a confusing outcome..... I've even tested with the text= set to foo (to make sure it wasn't just reading the word)

I'll have a think!

Okay, I can't recreate this error from within Phonos (only when directly using the Larynx API) — I think, given the low impact of this change on our production use of Phonos, we should focus on ensuring no regressions (which I've not personally seen in my own testing) from the SSML change and move this to product sign-off

Sounds good, we will move this to product sign-off. Thanks!

t ə m ˈɑ t oʊ, with instead of o ʊ, indicates it's each phoneme, and not each character, that needs to be separated by space. This strikes me as yet another case of excessive, restricting hard-coding (see T323912).

t ə m ˈɑ t oʊ, with instead of o ʊ, indicates it's each phoneme, and not each character, that needs to be separated by space. This strikes me as yet another case of excessive, restricting hard-coding (see T323912).

I believe this is "just" a technical limitation of Larynx (a text to speech engine that isn't going to be used on Wikimedia projects) — although təmˈɑtoʊ has been split by the code into t ə m ˈ ɑ t o ʊ (not t ə m ˈɑ t oʊ, that was my mistake), Larynx still "sees" that as təmˈɑtoʊ

I believe this is "just" a technical limitation of Larynx (a text to speech engine that isn't going to be used on Wikimedia projects) — although təmˈɑtoʊ has been split by the code into t ə m ˈ ɑ t o ʊ (not t ə m ˈɑ t oʊ, that was my mistake), Larynx still "sees" that as təmˈɑtoʊ

That may be the effect the engine currently outputs, but the documentation explicitly says "phonemes separated by whitespace". ˈr æ tʃ ɪ t indicates ratchet whereas ˈr æ t ʃ ɪ t indicates rat shit. It just doesn't seem like an assumption software should be making.