Larynx expects IPA characters to be separated by a space
Closed, ResolvedPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	TheresNoTime
	Sep 9 2022, 5:48 PM

Description

Larynx will mispronounce a word if the IPA passed is not space-separated — for example, tomato will be incorrectly shortened if the IPA passed is təmˈɑtoʊ, but will be correctly pronounced if it's space-separated (i.e. t ə m ˈɑ t oʊ)...

Additionally, Larynx will accept a SSML non-compliant (but slightly simpler) payload, as given below:

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="{language code}">
  <lexicon>
    <lexeme>
      <grapheme>
        {word}
      </grapheme>
      <phoneme>
        {space seperated IPA}
      </phoneme>
    </lexeme>
  </lexicon>
  <w>{word}</w>
</speak>

for example:

<?xml version="1.0"?>
<speak version="1.1"
       xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
                 http://www.w3.org/TR/speech-synthesis11/synthesis.xsd"
       xml:lang="en-US">
  <lexicon>
    <lexeme>
      <grapheme>
        tomato
      </grapheme>
      <phoneme>
        t ə m ˈɑ t oʊ
      </phoneme>
    </lexeme>
  </lexicon>
  <w>tomato</w>
</speak>

Action items

Update the Larynx engine to space-separate the IPA prior to inserting into the SSML.
Possibly trim down the Larynx engine SSML generation to the smaller, non-compliant example format.
Possibly investigate if this makes a significant difference to the quality of the returned pronunciations — this is really only useful for T317274: Use free software implementation for Phonos on Wikimedia sites

Details

	Subject	Repo	Branch	Lines +/-
	LarynxEngine.php: Update Larynx SSML	mediawiki/extensions/Phonos	master	+15 -13

Customize query in gerrit

Related Objects

Mentioned In: T328192: Remove automatic space insertion for Larynx
T323912: IPA validation and conversion is flawed/unnecessary
rEPHN1f67e0c88b71: LarynxEngine.php: Update Larynx SSML
Mentioned Here: T323912: IPA validation and conversion is flawed/unnecessary
T317274: Use free software implementation for Phonos on Wikimedia sites

Event Timeline

TheresNoTime created this task.Sep 9 2022, 5:48 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 9 2022, 5:48 PM

TheresNoTime claimed this task.Sep 13 2022, 5:33 PM

Change 832242 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/Phonos@master] [WIP] LarynxEngine.php: Update Larynx SSML

https://gerrit.wikimedia.org/r/832242

gerritbot added a project: Patch-For-Review.Sep 14 2022, 11:32 AM

Moving to the sprint since this is actively in development.

Change 832242 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] LarynxEngine.php: Update Larynx SSML

https://gerrit.wikimedia.org/r/832242

TheresNoTime mentioned this in rEPHN1f67e0c88b71: LarynxEngine.php: Update Larynx SSML.Sep 15 2022, 9:38 PM

ReleaseTaggerBot added a project: MW-1.40-notes (1.40.0-wmf.2; 2022-09-19).Sep 15 2022, 10:00 PM

Maintenance_bot removed a project: Patch-For-Review.Sep 15 2022, 10:30 PM

QA notes: Larnyx always requires the text= parameter to be set, and it should match whatever word the IPA is describing. I don't know how much the newly added space-separation will change things, if at all. For now, it's not hugely important since we're going with Google for the initial launch.

For your convenience here is an up-to-date Patch Demo with the corpus: https://patchdemo.wmflabs.org/wikis/63cd762b6b/wiki/Phonos (FYI -- all Patch Demos that include the Phonos extension automatically have this "Phonos" page already set up!)

In T317431#8241128, @MusikAnimal wrote:

For your convenience here is an up-to-date Patch Demo with the corpus: https://patchdemo.wmflabs.org/wikis/63cd762b6b/wiki/Phonos (FYI -- all Patch Demos that include the Phonos extension automatically have this "Phonos" page already set up!)

Most of these don't sound right™, but from memory it feels like more do than before?

TheresNoTime set the point value for this task to 2.Sep 21 2022, 2:31 PM

@TheresNoTime We tested in patch demo https://patchdemo.wmflabs.org/wikis/659c2d5bfa/wiki/Phonos (Change-Id: I87fbb6b10c4e0aa287700bf74a41849b70869500) which is based from 9/12 vs one with https://patchdemo.wmflabs.org/wikis/51183b9788/wiki/Phonos (Change-Id: Ia6170253ba1db5e53d248667f54f53928a704e81) which is from 9/20 and they both still sounds the same using your tomato example. I thought the old version would sound different. If I tested this wrong, please let me know the steps to recreate your error. Thanks!

T317431_Larynx expects IPA Space_FAIL.png (520×1 px, 64 KB)

In T317431#8252168, @GMikesell-WMF wrote:

@TheresNoTime We tested in patch demo https://patchdemo.wmflabs.org/wikis/659c2d5bfa/wiki/Phonos (Change-Id: I87fbb6b10c4e0aa287700bf74a41849b70869500) which is based from 9/12 vs one with https://patchdemo.wmflabs.org/wikis/51183b9788/wiki/Phonos (Change-Id: Ia6170253ba1db5e53d248667f54f53928a704e81) which is from 9/20 and they both still sounds the same using your tomato example. I thought the old version would sound different. If I tested this wrong, please let me know the steps to recreate your error. Thanks!

Thanks for this @GMikesell-WMF :) that's a confusing outcome..... I've even tested with the text= set to foo (to make sure it wasn't just reading the word)

I'll have a think!

Ok sounds good. If you find out anything or how to retest it, please let us know. Good luck!

In T317431#8252207, @TheresNoTime wrote:

In T317431#8252168, @GMikesell-WMF wrote:

@TheresNoTime We tested in patch demo https://patchdemo.wmflabs.org/wikis/659c2d5bfa/wiki/Phonos (Change-Id: I87fbb6b10c4e0aa287700bf74a41849b70869500) which is based from 9/12 vs one with https://patchdemo.wmflabs.org/wikis/51183b9788/wiki/Phonos (Change-Id: Ia6170253ba1db5e53d248667f54f53928a704e81) which is from 9/20 and they both still sounds the same using your tomato example. I thought the old version would sound different. If I tested this wrong, please let me know the steps to recreate your error. Thanks!

Thanks for this @GMikesell-WMF :) that's a confusing outcome..... I've even tested with the text= set to foo (to make sure it wasn't just reading the word)

I'll have a think!

TheresNoTime moved this task from QA 🐛 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-33) board.Sep 22 2022, 1:48 AM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-34); removed Community-Tech (CommTech-Sprint-33).Sep 26 2022, 8:26 PM

• JMcLeod_WMF moved this task from Ready 🎬 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-34) board.Sep 26 2022, 8:30 PM

MusikAnimal moved this task from Review/Feedback 💬 to QA 🐛 on the Community-Tech (CommTech-Sprint-34) board.Oct 3 2022, 7:05 PM

Okay, I can't recreate this error from within Phonos (only when directly using the Larynx API) — I think, given the low impact of this change on our production use of Phonos, we should focus on ensuring no regressions (which I've not personally seen in my own testing) from the SSML change and move this to product sign-off

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-35); removed Community-Tech (CommTech-Sprint-34).Oct 11 2022, 10:26 AM

• JMcLeod_WMF moved this task from Ready 🎬 to QA 🐛 on the Community-Tech (CommTech-Sprint-35) board.Oct 11 2022, 10:31 AM

Sounds good, we will move this to product sign-off. Thanks!

GMikesell-WMF moved this task from QA 🐛 to Product sign-off 🤘 on the Community-Tech (CommTech-Sprint-35) board.Oct 12 2022, 5:19 PM

• NRodriguez closed this task as Resolved.Oct 14 2022, 3:06 PM

• JMcLeod_WMF moved this task from Product sign-off 🤘 to Done 🏁 on the Community-Tech (CommTech-Sprint-35) board.Oct 17 2022, 7:08 PM

t ə m ˈɑ t oʊ, with oʊ instead of o ʊ, indicates it's each phoneme, and not each character, that needs to be separated by space. This strikes me as yet another case of excessive, restricting hard-coding (see T323912).

Nardog mentioned this in T323912: IPA validation and conversion is flawed/unnecessary.Dec 5 2022, 11:03 PM

In T317431#8445079, @Nardog wrote:

t ə m ˈɑ t oʊ, with oʊ instead of o ʊ, indicates it's each phoneme, and not each character, that needs to be separated by space. This strikes me as yet another case of excessive, restricting hard-coding (see T323912).

I believe this is "just" a technical limitation of Larynx (a text to speech engine that isn't going to be used on Wikimedia projects) — although təmˈɑtoʊ has been split by the code into t ə m ˈ ɑ t o ʊ (not t ə m ˈɑ t oʊ, that was my mistake), Larynx still "sees" that as təmˈɑtoʊ

In T317431#8445097, @TheresNoTime wrote:

I believe this is "just" a technical limitation of Larynx (a text to speech engine that isn't going to be used on Wikimedia projects) — although təmˈɑtoʊ has been split by the code into t ə m ˈ ɑ t o ʊ (not t ə m ˈɑ t oʊ, that was my mistake), Larynx still "sees" that as təmˈɑtoʊ

That may be the effect the engine currently outputs, but the documentation explicitly says "phonemes separated by whitespace". ˈr æ tʃ ɪ t indicates ratchet whereas ˈr æ t ʃ ɪ t indicates rat shit. It just doesn't seem like an assumption software should be making.

Nardog mentioned this in T328192: Remove automatic space insertion for Larynx.Jan 28 2023, 11:14 AM

	F35527686: T317431_Larynx expects IPA Space_FAIL.png
	Sep 22 2022, 1:06 AM

Larynx expects IPA characters to be separated by a spaceClosed, ResolvedPublic2 Estimated Story PointsActions