Investigate Google SSML/IPA rendering issues
Closed, ResolvedPublic2 Estimated Story Points
Actions

Assigned To

Authored By

	TheresNoTime
	Jul 25 2022, 9:59 AM

Description

As raised by Dom in T311233: Create Google phonos engine, Google is not correctly rendering audio for some SSML/IPA — we should investigate to see if there is a common denominator between these issues (e.g. encoding of IPA unicode characters, differing language phoneme support)

Details

Subject	Repo	Branch	Lines +/-
GoogleEngine: use php-ipa-validator to normalize IPA input	mediawiki/extensions/Phonos	master	+7 -8
Add theresnotime/ipa-validator 1.1.1	mediawiki/vendor	master	+1 K -3
GoogleEngine: Normalize apostrophes and remove parentheses	mediawiki/extensions/Phonos	master	+9 -3

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		MusikAnimal	T313711 Investigate Google SSML/IPA rendering issues
Resolved	Spike	MusikAnimal	T313497 [8 hours] Handle Google not supporting `/ ... /` abstract phonemic notation
Declined		None	T313494 Warn the user if they use unsupported IPA
Resolved	Spike	TheresNoTime	T314375 [4 hours] Investigate IPA validator methods

Event Timeline

TheresNoTime created this task.Jul 25 2022, 9:59 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 25 2022, 9:59 AM

TheresNoTime added a subtask: T313497: [8 hours] Handle Google not supporting `/ ... /` abstract phonemic notation.Jul 25 2022, 9:59 AM

TheresNoTime added a subtask: T313494: Warn the user if they use unsupported IPA.

TheresNoTime mentioned this in T309312: [Tracking] Phonos extension MVP.Jul 25 2022, 10:02 AM

TheresNoTime mentioned this in T311233: Create Google phonos engine.Jul 25 2022, 10:04 AM

Re T311233#8100597: (cc @dom_walden) It seems Google wants the language code to match up with the language of the actual word. I didn't test everything, but take Xochimilco for an example. This is a Spanish name, and even in the wikitext editors are using {{IPA-es}} to reinforce that it should be Spanish. If I pass the Spanish language code to Google, sotʃiˈmilko is pronounced correctly. Thus, perhaps we should be letting the editors (in this case the {{IPA-es}} template) dictate which language is used. Phonos currently lets you pass in the language code to override the default, which is the content language.

I couldn't get some of the examples from our corpus to work, but I think we're on to something in thinking the language code takes precedence in Google's algorithm.

To add to this, one thing we at least having going for us is there's a reasonable expectation that editors (or the templates they use) will pass in the correct text to go with the IPA, so Google should usually still get it right. But it does seem odd that the IPA is completely ignored, and of course there will be cases where the plain text isn't pronounced correctly either.

I might be thinking about this wrongly, but are we looking to pronounce IPA, or just pronounce the words? I was reading this old comment on Reddit and wondered…

It seems that Google gives pretty good results if you just give it text and a language code. For example (if we were to make an option for only supplying the text param) Google gives the following, where the first two are pretty much the same:

{{#phonos: text=Xochimilco |lang=es}}

{{#phonos: ipa=sotʃiˈmilko |lang=es}}

{{#phonos: text=Xochimilco |lang=en}}

I think that requiring editors to supply a lang code to Phonos is pretty reasonable. It'd be similar to the usage of the Use English templates, wouldn't it? i.e. on an Australian article, en-au could be set and so the pronunciation would be closer to regionally correct.

For example, this works {{#phonos: text=car park |lang=en-au}} where this {{#phonos: ipa=ˈkɑːpɑːk |lang=en-au }} fails to produce any output.

One advantage of only supplying text is that the Google quota usage will be much lower.

In T313711#8103757, @Samwilson wrote:

One advantage of only supplying text is that the Google quota usage will be much lower.

And as an added bonus, if you don't know how to write IPA you could still add the pronunciation to the article

In T313711#8102565, @MusikAnimal wrote:

Re T311233#8100597: (cc @dom_walden) It seems Google wants the language code to match up with the language of the actual word. I didn't test everything, but take Xochimilco for an example. ...

I repeated my tests but this time I passed the language. I took the language from the data.json file, only looking at the first two characters.

There are still some words it didn't pronounce correctly:

Tenochtitlan (tenoːt͡ʃˈtit͡ɬan)
Hyderabad (ˈɦaɪ̯daraːbaːd) (this is in English, perhaps it needs to be Hindi?)
Hasan Minhaj (ˈhʌsən ˈmɪnhɑː(d)ʒ)
Smørrebrød (ˈsmɶɐ̯ˌpʁœðˀ)
subtle (ˈsʌt(ə)l)
awful (ˈɔːfɫ̩)
fly (flaɪ̯)
catnip (ˈkætⁿnɪp)
apt (ˈæp̚t)
spotless (ˈspɒtˡlɨs)
peculiar (pʰə̥ˈkj̊uːliɚ)
key (k̟ʰi)
ευχαριστώ (ef.xa.ɾiˈsto)
chocolate (ˈt͡ʃɔk(ə)lɪt)
спасибо (spɐˈsʲibə)

These words returned the error below:

Ibibio (ɪbɪˈbiːəʊ) (perhaps it does not recognise the language code ib or ibb?)
wean (ˈwɪən) (lang code sco)
llandudno (ɬanˈdɨdno) (lang code cy)

{
    "error": {
        "code": "internal_api_error_MediaWiki\\Extension\\Phonos\\Exception\\PhonosException",
        "info": "[cf32e4f9e737a085a599afcb] Exception caught: Unable to retrieve audio using the Google engine: There was a problem during the HTTP request: 400 Bad Request",
        "errorclass": "MediaWiki\\Extension\\Phonos\\Exception\\PhonosException",
        "*": "MediaWiki\\Extension\\Phonos\\Exception\\PhonosException at /var/www/html/w/extensions/Phonos/includes/Engine/GoogleEngine.php(60)\nfrom /var/www/html/w/extensions/Phonos/includes/Engine/GoogleEngine.php(60)\n#0 /var/www/html/w/extensions/Phonos/includes/PhonosApi.php(37): MediaWiki\\Extension\\Phonos\\Engine\\GoogleEngine->getAudioData(string, string, string)\n#1 /var/www/html/w/includes/api/ApiMain.php(1901): MediaWiki\\Extension\\Phonos\\PhonosApi->execute()\n#2 /var/www/html/w/includes/api/ApiMain.php(875): ApiMain->executeAction()\n#3 /var/www/html/w/includes/api/ApiMain.php(846): ApiMain->executeActionWithErrorHandling()\n#4 /var/www/html/w/api.php(90): ApiMain->execute()\n#5 /var/www/html/w/api.php(45): wfApiMain()\n#6 {main}"
    },
    "servedby": "369413a69bdf"
}

Here is the output:

all_phonos_20220726135345.html2 MBDownload

With @MPhamWMF's help, we were able to figure out some of the examples Google is getting wrong is due to incorrect unicode characters. In particular from the list at T313711#8104497, the IPA that starts with an apostrophe ' should actually be U+02C8 (ˈ). Google also seems to get confused with the optional notation using parentheses, which we understand we can safely remove. So in the case of Chocolate for example, the IPA ˈt͡ʃɔk(ə)lɪt should be ˈt͡ʃɔkəlɪt. Using the latter, Google pronounces it correctly. Finally, there's also a colon that has a unicode doppelgänger. If I understand correctly ː should be : (the normal colon typed from your keyboard).

There are still some it gets wrong apparently, but I think we can safely say we need to do a find/replace server side for the lookalike apostrophes, colons, and strip out any parentheses. Our understanding is this should be safe to do. In addition, we could keep track of the fixes Phonos makes automatically and list them in the API response. This way, Phonos also serves as sort of an IPA "validator".

With respect to the :, I'm not sure which one google wants. IPA will use the one that is NOT the normal colon typed from the keyboard, unless somebody is taking some shortcuts. Google may be fine with the IPA symbol

• NRodriguez added a project: Community-Tech (CommTech-Sprint-30).Jul 28 2022, 3:02 PM

TheresNoTime closed subtask T313497: [8 hours] Handle Google not supporting `/ ... /` abstract phonemic notation as Resolved.Aug 1 2022, 9:51 AM

• NRodriguez set the point value for this task to 2.Aug 11 2022, 5:32 PM

Change 822464 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] GoogleEngine: Normalize apostrophes and remove parentheses

https://gerrit.wikimedia.org/r/822464

gerritbot added a project: Patch-For-Review.Aug 11 2022, 11:49 PM

The above patch probably doesn't fix everything, but it fixes what we know about so far with apostrophes and parentheses. I did not include a replacement for the colons, since Google doesn't seem to care which one is used.

Change 822464 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] GoogleEngine: Normalize apostrophes and remove parentheses

https://gerrit.wikimedia.org/r/822464

MusikAnimal mentioned this in rEPHNcb0a0a92d1ed: GoogleEngine: Normalize apostrophes and remove parentheses.Aug 12 2022, 4:17 AM

Maintenance_bot removed a project: Patch-For-Review.Aug 12 2022, 4:30 AM

ReleaseTaggerBot added a project: MW-1.39-notes (1.39.0-wmf.25; 2022-08-15).Aug 12 2022, 5:00 AM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-31); removed Community-Tech (CommTech-Sprint-30).Aug 15 2022, 2:52 PM

• JMcLeod_WMF moved this task from Ready 🎬 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-31) board.Aug 15 2022, 2:55 PM

TheresNoTime closed subtask T313494: Warn the user if they use unsupported IPA as Declined.Aug 15 2022, 10:27 PM

TheresNoTime mentioned this in T313494: Warn the user if they use unsupported IPA.

• JMcLeod_WMF moved this task from Backlog to 🌟Top Priority on the MediaWiki-extensions-Phonos board.Aug 16 2022, 2:05 PM

TheresNoTime moved this task from Review/Feedback 💬 to QA 🐛 on the Community-Tech (CommTech-Sprint-31) board.Aug 17 2022, 8:25 AM

I repeated my previous experiment with all the IPA from our corpus, but setting the text parameter to "foo". When Google does not understand an IPA character, it ignores all the IPA and just pronounces the text, in this case "foo". (I used this wikitext P32560).

In T313711#8104497, @dom_walden wrote:

There are still some words it didn't pronounce correctly:

Tenochtitlan (tenoːt͡ʃˈtit͡ɬan)

Hyderabad (ˈɦaɪ̯daraːbaːd) (this is in English, perhaps it needs to be Hindi?)

Hasan Minhaj (ˈhʌsən ˈmɪnhɑː(d)ʒ)

Smørrebrød (ˈsmɶɐ̯ˌpʁœðˀ)

subtle (ˈsʌt(ə)l)

awful (ˈɔːfɫ̩)

fly (flaɪ̯)

catnip (ˈkætⁿnɪp)

apt (ˈæp̚t)

spotless (ˈspɒtˡlɨs)

peculiar (pʰə̥ˈkj̊uːliɚ)

key (k̟ʰi)

ευχαριστώ (ef.xa.ɾiˈsto)

chocolate (ˈt͡ʃɔk(ə)lɪt)

спасибо (spɐˈsʲibə)

It still pronounces all the above as "Foo", with the exception of the below which are now pronounced correctly:

Hasan Minhaj (ˈhʌsən ˈmɪnhɑː(d)ʒ)
subtle (ˈsʌt(ə)l)
chocolate (ˈt͡ʃɔk(ə)lɪt)

So there still seems to be IPA characters Google does not understand or we need to be convert to equivalent characters it does understand.

Test environment: local docker Phonos 0.1.0 (3ccf24e) 23:53, 17 August 2022.

For tenoːt͡ʃˈtit͡ɬan, I worked backwards removing a character at a time until Google correctly tried to use the IPA — tenoːt͡ʃˈtit "works", so it's something to do with t͡

I guess we could do this per word and find the problematic characters?

I'll do a couple below, and if @MPhamWMF has any input that'd be appreciated!

tenoːt͡ʃˈtit͡ɬan works, pronounced almost correctly, as tenoːtʃˈtitɬan
ˈɔːfɫ̩ works, pronounced correctly, as ˈɔːfl
ˈkætⁿnɪp works, pronounced correctly, as ˈkætnɪp
flaɪ̯ works, pronounced correctly, as flaɪ
pʰə̥ˈkj̊uːliɚ works, pronounced correctly, as phəˈkjuːliɚ

I'm starting to see a pattern — the issue words have Combining Diacritical Marks, which when removed are rendered correctly...

I've been half-working on a node.js IPA validator/normalizer (and a composer package too!) — running the corpus through this prior to sending the IPA to Google (along with the text "foo" to test, and leaving the lang blank) results in all words pronouncing the IPA (mostly correctly) except for the following normalized IPA:

sotʃiˈmilko (works if you set lang to es)
paɾanɡaɾikutiɾiˈmikwaɾo (works if you set lang to es)
mexiko (works if you set lang to es)
ˈwɪən (works if you set lang to en-gb)

ˈsmɶɐˌpʁœðˀ (did not normalize fully, will fix!)
ef.xa.ɾiˈsto

https://phonos.theresnotime.io/w/index.php?title=T313711 has the demo (they're all cached)

In T313711#8168179, @TheresNoTime wrote:

For tenoːt͡ʃˈtit͡ɬan, I worked backwards removing a character at a time until Google correctly tried to use the IPA — tenoːt͡ʃˈtit "works", so it's something to do with t͡

I guess we could do this per word and find the problematic characters?

I'll do a couple below, and if @MPhamWMF has any input that'd be appreciated!

tenoːt͡ʃˈtit͡ɬan works, pronounced almost correctly, as tenoːtʃˈtitɬan

ˈɔːfɫ̩ works, pronounced correctly, as ˈɔːfl

ˈkætⁿnɪp works, pronounced correctly, as ˈkætnɪp

flaɪ̯ works, pronounced correctly, as flaɪ

pʰə̥ˈkj̊uːliɚ works, pronounced correctly, as phəˈkjuːliɚ

I'm starting to see a pattern — the issue words have Combining Diacritical Marks, which when removed are rendered correctly...

This mostly makes sense to me. Lots of the diacritics are specifying some sort of modification to an existing sound -- e.g. has voicing or not; tongue touches teeth instead of the alveolar ridge behind the top teeth; etc -- and are generally not contrastive. Which is to say that rendering the un-diacriticked symbol should usually not change the meaning of the word in the target language. It may at times not sound fully accurate, but this was always going to be an issue due to what we talked about before about how narrow/broad IPA transcriptions decide to go. I think this is within an acceptable scope of "close enough".
My guess is that because each of these diacritics adds an extra dimension to an existing sound, trying to create a library for each possible combination would create a huge combinatorial space, of which many sounds may not be attested yet in real languages, so there would be no way of actually recording it properly anyway -- so people who create the sound recording libraries probably just skip them altogether unless they are very common/required for a language

@TheresNoTime Very cool you made a composer package! Shall we start using that (or copy the code over) in Phonos? I think ideally end users who are using valid IPA shouldn't have to manipulate it to appease Google. Rather, any normalization should silently happen behind the scenes.

In T313711#8179717, @MusikAnimal wrote:

@TheresNoTime Very cool you made a composer package! Shall we start using that (or copy the code over) in Phonos? I think ideally end users who are using valid IPA shouldn't have to manipulate it to appease Google. Rather, any normalization should silently happen behind the scenes.

Thank you! It'd be very cool to say https://packagist.org/packages/theresnotime/ipa-validator was being used on Wikipedia, but having another external package probably isn't going to do us many favours in the security review?

In T313711#8179732, @TheresNoTime wrote:

Thank you! It'd be very cool to say https://packagist.org/packages/theresnotime/ipa-validator was being used on Wikipedia, but having another external package probably isn't going to do us many favours in the security review?

I was thinking the same thing. For security review alone it probably makes more sense to just migrate the code to Phonos, which I guess technically might require first re-licensing it to GPL-2.0-or-later, or leaving a comment above the copied code in Phonos, ...or do neither! I have my doubts there will be any sort of litigation regarding you copying your own code :)

Discussed in last RTL, unsure if this should be closed as Resolved or if any tickets should be cut by followup action items

@TheresNoTime mind letting me know?

In T313711#8182677, @NRodriguez wrote:

Discussed in last RTL, unsure if this should be closed as Resolved or if any tickets should be cut by followup action items

@TheresNoTime mind letting me know?

Sorry, didn't get the ping! I've relicensed https://github.com/theresnotime/php-ipa-validator as GPL-2.0-or-later (GPL-3.0 I guess), so it can be migrated into Phonos (or, we can just lock theresnotime/ipa-validator to 1.0.5 which will be fairly safe security review wise?)

TheresNoTime mentioned this in T314375: [4 hours] Investigate IPA validator methods.Aug 29 2022, 4:28 AM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-32); removed Community-Tech (CommTech-Sprint-31).Aug 29 2022, 2:30 PM

• JMcLeod_WMF moved this task from Ready 🎬 to Product sign-off 🤘 on the Community-Tech (CommTech-Sprint-32) board.Aug 29 2022, 2:32 PM

I've relicensed https://github.com/theresnotime/php-ipa-validator as GPL-2.0-or-later (GPL-3.0 I guess), so it can be migrated into Phonos (or, we can just lock theresnotime/ipa-validator to 1.0.5 which will be fairly safe security review wise?)

Great, thanks for the update! Any other outstanding work from this task or is it good to resolve?

cc @MusikAnimal @TheresNoTime

• NRodriguez moved this task from Product sign-off 🤘 to In Development 💻 on the Community-Tech (CommTech-Sprint-32) board.Aug 31 2022, 7:07 PM

Noting we talked to the Security team and they consider the new package to be low-risk, and thus we can use it \o/

I guess I'll code this up today since the task is already assigned to me, but the underlying credit for the work goes to Sammy as she wrote the package! :)

Change 829068 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] GoogleEngine: use php-ipa-validator to normalize IPA input

https://gerrit.wikimedia.org/r/829068

gerritbot added a project: Patch-For-Review.Sep 1 2022, 9:19 PM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-33); removed Community-Tech (CommTech-Sprint-32).Sep 12 2022, 3:00 PM

• JMcLeod_WMF moved this task from Ready 🎬 to In Development 💻 on the Community-Tech (CommTech-Sprint-33) board.Sep 12 2022, 3:05 PM

Change 831998 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/vendor@master] Add theresnotime/ipa-validator

https://gerrit.wikimedia.org/r/831998

MusikAnimal moved this task from In Development 💻 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-33) board.Sep 14 2022, 4:44 PM

Since https://gerrit.wikimedia.org/r/831998 is stalled until the associated security review is completed (T316913), we are left with option of either not doing this bit of normalization for the time being, or copying/pasting the code. I'm going to go with the former for now, and just put this ticket in Needs Attention so we don't forget about it. The patch to make use of ipa-validator is already +2'd and should merge on its own once https://gerrit.wikimedia.org/r/831998 is merged (which I assume the Security team will do).

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-34); removed Community-Tech (CommTech-Sprint-33).Sep 26 2022, 8:27 PM

• JMcLeod_WMF moved this task from Ready 🎬 to Needs Attention 👀 on the Community-Tech (CommTech-Sprint-34) board.Sep 26 2022, 8:37 PM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-35); removed Community-Tech (CommTech-Sprint-34).Oct 11 2022, 10:27 AM

• JMcLeod_WMF moved this task from Ready 🎬 to Needs Attention 👀 on the Community-Tech (CommTech-Sprint-35) board.Oct 11 2022, 10:32 AM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-36); removed Community-Tech (CommTech-Sprint-35).Nov 7 2022, 10:12 PM

• JMcLeod_WMF moved this task from Ready 🎬 to Needs Attention 👀 on the Community-Tech (CommTech-Sprint-36) board.Nov 7 2022, 10:18 PM

TheresNoTime closed this task as Resolved.Nov 8 2022, 3:20 PM

• JMcLeod_WMF moved this task from Needs Attention 👀 to Done 🏁 on the Community-Tech (CommTech-Sprint-36) board.Nov 23 2022, 10:11 PM

Nardog subscribed.Nov 25 2022, 8:11 AM

In T313711#8104497, @dom_walden wrote:

catnip (ˈkætⁿnɪp)

apt (ˈæp̚t)

peculiar (pʰə̥ˈkj̊uːliɚ)

key (k̟ʰi)

These shouldn't be converted willy-nilly unless the engine is capable of reproducing the exact phonetic (= physical) realizations these symbols represent. [p̚], for example, means there's no popping noise arising from exhalation and pressure change when the lips are opened. Unless the engine can reliably output audio with and without such subtleties at command, these phonetic transcriptions should never be clickable.

/ˈmɪnhɑː(d)ʒ)/ means /ˈmɪnhɑːdʒ/ or /ˈmɪnhɑːʒ/. If it doesn't return two distinct iterations then it should be considered a fail.

Change 831998 abandoned by MusikAnimal:

[mediawiki/vendor@master] Add theresnotime/ipa-validator 1.1.1

Reason:

https://gerrit.wikimedia.org/r/831998

Restricted Application edited projects, added Community-Tech; removed Community-Tech (CommTech-Sprint-36). · View Herald TranscriptMar 10 2023, 8:49 PM

Change 829068 abandoned by Samtar:

[mediawiki/extensions/Phonos@master] GoogleEngine: use php-ipa-validator to normalize IPA input

Reason:

Upstream change abandoned

https://gerrit.wikimedia.org/r/829068

Maintenance_bot removed a project: Patch-For-Review.Mar 17 2023, 11:10 AM

	F35339692: all_phonos_20220726135345.html
	Jul 26 2022, 1:04 PM

	F35338975: en.wav
	Jul 26 2022, 7:38 AM

	F35338983: es-ipa.wav
	Jul 26 2022, 7:38 AM

	F35338971: es.wav
	Jul 26 2022, 7:38 AM

Investigate Google SSML/IPA rendering issuesClosed, ResolvedPublic2 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

Investigate Google SSML/IPA rendering issues
Closed, ResolvedPublic2 Estimated Story Points
Actions

Related Objects
Search...