Page MenuHomePhabricator

Create Google phonos engine
Closed, ResolvedPublic3 Estimated Story Points

Description

Extend the EngineInterface (similar to the espeak demo) to send API requests to Google


Accepted SSML format
<?xml version="1.0"?>
<speak>
  <lang xml:lang="{lang}">
    <phoneme alphabet="ipa" ph="{ipa}">{word}</phoneme>
  </lang>
</speak>

Event Timeline

This work is pending a conversation with legal regarding details about API clause and how we'll be implementing this cc @MusikAnimal

TheresNoTime changed the task status from Open to Stalled.Jun 29 2022, 1:14 PM
MusikAnimal changed the task status from Stalled to Open.Jul 20 2022, 4:31 PM
MusikAnimal claimed this task.

After talking with the language team, I don't think we have a legal barrier to moving forward with this. We do however probably will need to advertise the use of Google somewhere in the UI. That can be ironed out in a separate patch, but for now I figure let's get something working (using just free credits).

Change 815812 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Add GoogleEngine and test for SSML

https://gerrit.wikimedia.org/r/815812

MusikAnimal set the point value for this task to 3.

Estimate: 2 points, actual time spent: 3 points. What took so long was figuring out that we have to first base64 decode the response from Google's API!

Using

{{#phonos: text=nope | ipa=/həˈləʊ/ | type=ipa | lang=en}}

as expected, renders

/həˈləʊ/

but generates the audio for the text value, ignoring the IPA


The generated SSML for this was:

<?xml version="1.0"?>
<speak>
  <lang xml:lang="en">
    <phoneme alphabet="ipa" ph="/h&#x259;&#x2C8;l&#x259;&#x28A;/">nope</phoneme>
  </lang>
</speak>

Okay a bit more digging, this time with a "Google approved" IPA/SSML example:

{{#phonos: text=nope | ipa=ˌmænɪˈtoʊbə | type=ipa | lang=en}}

as expected, renders

ˌmænɪˈtoʊbə

and generates the IPA audio


The generated SSML was:

<?xml version="1.0"?>
<speak>
  <lang xml:lang="en">
    <phoneme alphabet="ipa" ph="&#x2CC;m&#xE6;n&#x26A;&#x2C8;to&#x28A;b&#x259;">nope</phoneme>
  </lang>
</speak>

Macro scremcat:
changing the /həˈləʊ/ IPA to həˈləʊ solved the issue — Google does not support / stress indicators but never explicitly mentions this (and even references their use elsewhere!)

{{#phonos: text=nope | ipa=həˈləʊ | type=ipa | lang=en}}

Change 815812 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Add GoogleEngine and test for SSML

https://gerrit.wikimedia.org/r/815812

QA notes: you will need an API key for this. I will message you with one you can use, but it may be better for you to create your own. Also note the Sammy's findings above, as well as T313497.

Change 816215 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] GoogleEngine: trim unnecessary XML and remove outer slashes from IPA

https://gerrit.wikimedia.org/r/816215

changing the /həˈləʊ/ IPA to həˈləʊ solved the issue — Google does not support / stress indicators but never explicitly mentions this (and even references their use elsewhere!)

Interesting! I've uploaded https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Phonos/+/816215 which trims the slashes from the IPA (as well as removing unnecessary bits from the XML since Google charges by character count). Is that sufficient or are these slashes also use elsewhere in the IPA string?

Change 816215 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] GoogleEngine: trim unnecessary XML and remove outer slashes from IPA

https://gerrit.wikimedia.org/r/816215

@MusikAnimal I don't think Google understands all the IPA we are giving it.

I ran all the words in the corpus through the API but for the text input for all of them I passed "foobar". This was so I could tell whether it was pronouncing the IPA or the text.

Below are all the words for which the audio incorrectly spoke "foobar", suggesting it does not understand the IPA:

  • Xochimilco (sotʃiˈmilko)
  • Tenochtitlan (tenoːt͡ʃˈtit͡ɬan)
  • Hyderabad (ˈɦaɪ̯daraːbaːd)
  • Hasan Minhaj (ˈhʌsən ˈmɪnhɑː(d)ʒ)
  • Parangaricutirimícuaro (paɾanɡaɾikutiɾiˈmikwaɾo)
  • México (mexiko)
  • anticonstitutionnellement (ɑ̃.ti.kɔ̃s.ti.ty.sjɔ.nɛl.mɑ̃)
  • Smørrebrød (ˈsmɶɐ̯ˌpʁœðˀ)
  • subtle (ˈsʌt(ə)l)
  • awful (ˈɔːfɫ̩)
  • fly (flaɪ̯)
  • catnip (ˈkætⁿnɪp)
  • apt (ˈæp̚t)
  • spotless (ˈspɒtˡlɨs)
  • peculiar (pʰə̥ˈkj̊uːliɚ)
  • key (k̟ʰi)
  • vin blanc (vɛ̃ blɑ̃)
  • wean (ˈwɪən)
  • llandudno (ɬanˈdɨdnoː)
  • Montréal (mɔ̃.ʁe.al)
  • ευχαριστώ (ef.xa.ɾiˈsto)
  • chocolate (ˈt͡ʃɔk(ə)lɪt)
  • спасибо (spɐˈsʲibə)

Here is the full output in HTML: