Create Larynx phonos engine
Closed, ResolvedPublic5 Estimated Story Points
Actions

Assigned To

Authored By

	TheresNoTime
	Jun 23 2022, 1:25 PM

Description

Extend the EngineInterface (similar to the espeak demo) to send API requests to Larynx

Endpoint: https://larynx-tts.wmcloud.org/api
Method: POST or GET
Docs: https://larynx.theresnotime.io/openapi (swagger)

Accepted SSML format

<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.1" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis11/synthesis.xsd" xml:lang="en-US">
  <lexicon xml:id="ipaInput" alphabet="ipa">
    <lexeme>
      <grapheme>{word}</grapheme>
      <phoneme>{ipa}</phoneme>
    </lexeme>
  </lexicon>
  <lookup ref="ipaInput">
    <w>{word}</w>
  </lookup>
</speak>

Details

	Subject	Repo	Branch	Lines +/-
	Add LarynxEngine and basic tests	mediawiki/extensions/Phonos	master	+158 -2

Customize query in gerrit

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		None	T311232 [Tracking] Create phonos engine interfaces
		Resolved		MusikAnimal	T311234 Create Larynx phonos engine

Event Timeline

TheresNoTime created this task.Jun 23 2022, 1:25 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 23 2022, 1:25 PM

TheresNoTime added a parent task: T311232: [Tracking] Create phonos engine interfaces.Jun 23 2022, 1:26 PM

TheresNoTime mentioned this in T311232: [Tracking] Create phonos engine interfaces.

TheresNoTime updated the task description. (Show Details)Jun 23 2022, 1:31 PM

TheresNoTime added a project: Community-Tech (CommTech-Sprint-27).Jun 23 2022, 1:43 PM

MusikAnimal claimed this task.Jun 27 2022, 6:55 PM

MusikAnimal moved this task from Ready 🎬 to In Development 💻 on the Community-Tech (CommTech-Sprint-27) board.

MusikAnimal mentioned this in T309707: Implement functionality to handle different Text-To-Speech engines.Jun 27 2022, 7:30 PM

TheresNoTime merged a task: T309707: Implement functionality to handle different Text-To-Speech engines.Jun 27 2022, 7:51 PM

TheresNoTime added a subscriber: MusikAnimal.

Change 809211 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Add LarynxEngine and basic tests

https://gerrit.wikimedia.org/r/809211

gerritbot added a project: Patch-For-Review.Jun 28 2022, 3:54 PM

MusikAnimal moved this task from In Development 💻 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-27) board.Jun 28 2022, 5:50 PM

TheresNoTime mentioned this in T309312: [Tracking] Phonos extension MVP.Jun 29 2022, 12:30 PM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-28); removed Community-Tech (CommTech-Sprint-27).Jul 6 2022, 1:21 PM

• JMcLeod_WMF moved this task from Ready 🎬 to Review/Feedback 💬 on the Community-Tech (CommTech-Sprint-28) board.Jul 6 2022, 1:25 PM

Input

{{#phonos: text=hello | ipa=/həˈləʊ/ | type=ipa | lang=en}}

API call

/api.php?action=phonos&format=json&ipa=h%C9%99%CB%88l%C9%99%CA%8A%22&text=&lang=en-gb

Generated SSML

<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.1" xml:lang="en">
  <lexicon xml:id="ipaInput" alphabet="ipa">
    <lexeme>
      <grapheme>hello</grapheme>
      <phoneme>/h&#x259;&#x2C8;l&#x259;&#x28A;/</phoneme>
    </lexeme>
  </lexicon>

  <lookup ref="ipaInput">
    <w>hello</w>
  </lookup>
</speak>

Audio

Changing the SSML to

<?xml version="1.0"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.1" xml:lang="en">
  <lexicon xml:id="ipaInput" alphabet="ipa">
    <lexeme>
      <phoneme>/h&#x259;&#x2C8;l&#x259;&#x28A;/</phoneme>
    </lexeme>
  </lexicon>

  <lookup ref="ipaInput">
    <w>hello</w>
  </lookup>
</speak>

(i.e. removing the grapheme from the lexeme) seems to produce a better sounding result?

TheresNoTime attached a referenced file: F35309780: tts(14).wav. (Show Details)Jul 6 2022, 3:25 PM

Change 809211 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Add LarynxEngine and basic tests

https://gerrit.wikimedia.org/r/809211

MusikAnimal mentioned this in rEPHNb695648f3344: Add LarynxEngine and basic tests.Jul 7 2022, 4:31 AM

Maintenance_bot removed a project: Patch-For-Review.Jul 7 2022, 5:30 AM

MusikAnimal moved this task from Review/Feedback 💬 to QA 🐛 on the Community-Tech (CommTech-Sprint-28) board.Jul 7 2022, 8:40 PM

QA notes: Depending on what IPA you give it, there are noticeable issues with audio. See discussion at https://gerrit.wikimedia.org/r/809211 and Sammy's notes at T311234#8055863. I think we're going to figure all of that out in a separate task/patch (maybe T311693), so for now you can test with anything that works for you, and sort of pretend it sounds decent :) Anything you find that is WAY off is probably worth documenting here, however.

I submitted all the words currently in our test corpus to the Phonos API. Below are the results in HTML including the word, IPA, SSML and audio file. Hopefully you can open it in your browser.

Further to T311234#8055863 I repeated it with lines 84 and 85 of LarynxEngine.php commented out so that the <grapheme> tag is not added. It does improve audio in most cases, in my opinion:

Better without grapheme:

Xochimilco (still not 100%)
Tenochtitlan (still not 100%)
Albuquerque
Mexico (but has an "English" pronunciation)
anticonstitutionnellement (but has an "English" pronunciation)
subtle
jewellery
hamburger
awful
fly (still not 100%)
catnip (still not 100%)
vin blanc (but has an "English" pronunciation)
wean
llandudno (still not 100%)
louisville (still not 100%)
chocolate
sushi
supercalifragilisticexpialidocious (sounds better to my ears)
cornwall
thames
gunwalloe (I think, I don't know how to pronounce this myself)
somerset

(Some of these would probably be improved even more if we chose the correct voice language.)

Still wrong without grapheme:

peculiar
apt
namaste (no sound)
nanri (no sound)
montreal
ευχαριστώ (no sound)
спасибо (no sound, but this sounds good with grapheme)
porthleven

all.html3 MBDownload

In T311234#8064411, @dom_walden wrote:

Further to T311234#8055863 I repeated it with lines 84 and 85 of LarynxEngine.php commented out so that the <grapheme> tag is not added. It does improve audio in most cases, in my opinion:

Hmm, thinking about this again. I have the suspicion that removing the <grapheme> tag means Larynx ignores the <phoneme> tag (with the IPA) so it just tries to pronounce the word inside the <w> tags rather than pronouncing the IPA. It does a better job trying to pronounce the word in its normal spelling rather than by trying to interpreting the IPA.

To test this hypothesis, I commented out lines 80 to 89 to completely remove the <lexeme> tag so we just have the <w> tag and the word in its normal spelling. The audio was very similar[1] to without the <grapheme> tag. I uploaded it below.

Taggings @TheresNoTime @MusikAnimal.

without_lexeme.html1 MBDownload

Notes

When testing Larynx locally I found a certain amount of unreliability in the audio when I repeated the same request for a pronunciation multiple times. Each time could have a slight variation in pronunciation.

dom_walden moved this task from Product sign-off 🤘 to QA 🐛 on the Community-Tech (CommTech-Sprint-28) board.Jul 8 2022, 8:18 AM

• JMcLeod_WMF edited projects, added Community-Tech (CommTech-Sprint-29); removed Community-Tech (CommTech-Sprint-28).Jul 18 2022, 5:34 PM

• JMcLeod_WMF moved this task from Ready 🎬 to QA 🐛 on the Community-Tech (CommTech-Sprint-29) board.Jul 18 2022, 5:36 PM

I guess we can have the debate about how to make the Larynx engine better at some point in the future.

For now, I have nothing more to add here.

@NRodriguez Actually I can't create a Patch Demo since the Phonos extension is not available on it yet. I can make a pull request to get that added, but considering we aren't planning to use the Larynx engine in production, in addition to its known problems (see comments above), perhaps it's not worth the hassle. If you still want to test it let me know, but otherwise this task can probably be closed.

MusikAnimal set the point value for this task to 5.Jul 27 2022, 2:34 PM

I'm going to call this resolved if only because the scope of the task (create this engine) has been completed & tested, and we are otherwise not focusing on Larynx

• JMcLeod_WMF moved this task from Product sign-off 🤘 to Done 🏁 on the Community-Tech (CommTech-Sprint-29) board.Aug 2 2022, 2:55 PM

	F35311604: without_lexeme.html
	Jul 8 2022, 7:32 AM

	F35311573: all.html
	Jul 8 2022, 7:13 AM

	F35309780: tts(14).wav
	Jul 6 2022, 3:08 PM

	F35309773: tts(13).wav
	Jul 6 2022, 3:04 PM

Create Larynx phonos engineClosed, ResolvedPublic5 Estimated Story PointsActions