Page MenuHomePhabricator

Audio pronunciation: Automatic text-to-speech to convert IPA to sound
Open, In Progress, LowPublic

Description

Suggestion: An audio pronunciation feature for entries, similar to that found on <dictionary.reference.com>.
But using the existing IPA in each page (in Wiktionary, and some in Wikipedia and elsewhere) to auto-generate a sound file, instead of waiting for humans to manually record a file for every pronunciation of every word.

See also:

Details

Reference
bz31221
ReferenceSource BranchDest BranchAuthorTitle
repos/data-engineering/airflow-dags!361T332216-set-archive-retries-to-zeromainxcollazowmf_airflow_common: Force HDFSArchiveOperator to have retries=0.
Customize query in GitLab

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

wmf.amgine3691 wrote:

This could be a reasonably classic tag extension, <text2speech type="IPA" icon="speaker">/a.zyʁ/</text2speech>

It would also be a cool feature on the Wiktionary Mobile App, which uses the speaker icon as a button to play pronunciation files for articles which have them.

(In reply to comment #4)

The current system
doesn't work because uploading individual files of individual words is a pain
in the ass, I think.

I haven't tried (have you?) but it shouldn't be. Nowadays plenty of people has a decent quality mic connected to the Internet.

the voice technology that came with computers fifteen years ago is "good
enough" or can be wrestled to be.

In English and a few other languages sure, but we have close to 200 Wiktionaries and most of those languages probably won't benefit from that tech research anytime soon.

For a reference, see

http://www.loquendo.com/en/products/text-to-speech/languages-voices/
http://www2.research.att.com/~ttsweb/tts/demo.php

Another step will be to wait for open source alternative of these propriatery and nowadays lucrative systems...

But ideally you'd have the ability to turn proper IPA into sound. A smarter
solution is needed. Consider this a brainstorming bug. :-)

Just in case its useful:

http://www.w3.org/TR/speech-synthesis/#edef_phoneme
http://en.wikipedia.org/wiki/Speech_synthesis#Text-to-phoneme_challenges'

So if I'm reading this right, this requires an 'IPA -> Sound' engine of some sort?

wmf.amgine3691 wrote:

(In reply to comment #7)

So if I'm reading this right, this requires an 'IPA -> Sound' engine of some
sort?

There is also a bug requesting the reverse, Sound -> IPA, but for this specific enhancement, yes.

(In reply to comment #8)

There is also a bug requesting the reverse, Sound -> IPA

Where? I couldn't find it.

rahul14m93 wrote:

Many Words in Wikitionary have a pronunciation attached along with it,but there are many words that dont have this feature
Example : http://en.wiktionary.org/wiki/compendium

And i did a quick random survey and i found out that words which are pertaining to a specific field like mathematics,chemistry don't have the pronunciation attached

A solution which i propose is to provide a button to record the sound ,on clicking that button a 5 sec recording starts within which the speaker should be loud, clear and adhere to the phonetics. A Rating feature would also go along well with this feature,to describe it briefly :People can record the pronunciation and volunteers can rate the recordings out of 5(similar to the imdb ones :))

We could have the recordings saved in the ogg or wav format.

wmf.amgine3691 wrote:

(In reply to comment #9)

(In reply to comment #8)

There is also a bug requesting the reverse, Sound -> IPA

Where? I couldn't find it.

Neither can I. It was in a discussion about a mobile tool for recording spoken word and uploading to commons, both for the Wiktionary project and wikisource (for oral history recordings needing transcriptions.)

(In reply to comment #6)

(In reply to comment #4)

The current system
doesn't work because uploading individual files of individual words is a pain
in the ass, I think.

I haven't tried (have you?) but it shouldn't be.

That's what requested here, in fact. :)

Nowadays plenty of people
has
a decent quality mic connected to the Internet.

Which is why exploiting this resource is a good project.

(In reply to comment #7)

So if I'm reading this right, this requires an 'IPA -> Sound' engine of some
sort?

Not what comment 0 asked, but some proposed it; clarifying summary.

(In reply to comment #11)

Neither can I. It was in a discussion about a mobile tool for recording
spoken
word and uploading to commons, both for the Wiktionary project and wikisource

Indeed, see URL where there's clear interest from the communities.
It's still not clear, from a Wikimedia projects point of view, if the aim is best served by an extension or other system, but the request is legit.

Adding Lars, who proposed the voice recording tool idea at:

http://thread.gmane.org/gmane.org.wikimedia.wiktionary/1265

And some context: Rahul - see comment 10 - is interested in this project for Google Summer of Code. Having a community need declared increases points for him. If someone would volunteer as mentor then his chances would increase even more (hint, hint).

rahul14m93 wrote:

(In reply to comment #13)

I would like someone to reply to comment #10

Quim Gil-Thanks,Had to think alot ,as i told you!I'd be glad to work for this project

(In reply to comment #13)

Adding Lars, who proposed the voice recording tool idea at:

http://thread.gmane.org/gmane.org.wikimedia.wiktionary/1265

And some context: Rahul - see comment 10 - is interested in this project for
Google Summer of Code. Having a community need declared increases points for
him. If someone would volunteer as mentor then his chances would increase
even
more (hint, hint).

As an aside I think that should be discussed in a separate bug. Getting humans to record sound and getting auto tts of ipa (what comment 0 is asking for) is rather different.

(In reply to comment #15)

As an aside I think that should be discussed in a separate bug. Getting
humans
to record sound and getting auto tts of ipa (what comment 0 is asking for) is
rather different.

Agreed, the original report requested text-to-speech. Easier pronunciation recording is a whole different thing, so I've filed bug 46610 and retargeted this to the original request.

As a side note, I agree computer pronunciation is inferior to a human recording. The question is whether it's enough better than nothing to be worth implementing.

rahul14m93 wrote:

I have prepared a rough project proposal Please do give me your feedback and suggestions so that i can improve on it https://www.mediawiki.org/wiki/User:Rahul21/Gsoc

Rahul, your proposal is related to

Bug 46610 - Pronunciation recording tool

Please announce it there. Thank you!

rahul14m93 wrote:

I am sorry 2 tabs opened at the same,causing some confusion!

One year of silence. Setting to Lowest only to reflect the current reality, which is that nobody we are aware of is working or planning to work on this.

Is this a duplicate to T2224? They seem to want to achieve the same thing, but are expressed in different ways.

A) Awesome! I requested this in 2010! :D
B) There were some concerns raised at the time, about issues with dialect variance. See a brief explanation in https://en.wikipedia.org/wiki/Help_talk:IPA/Archive_2#Embedded_IPA_pronunciation_soundfiles
I imagine the rabbit hole goes deeper though, but I have no expertise in this area.

Sidenote, not already mentioned above, so see also https://en.wikipedia.org/wiki/IPA_pulmonic_consonant_chart_with_audio and https://en.wikipedia.org/wiki/IPA_vowel_chart_with_audio

Converting IPA to speech using lexconvert's correspondences to eSpeak phonemes and eSpeak seems to yield reasonable results (at least for the sampled English Wikipedia entries that have IPA).

To try, paste the IPA into the text box at https://itinerarium.github.io/phoneme-synthesis/.

Perhaps, some scripted process can run

python lexconvert.py --try unicode-ipa "/mʊmˈbaɪ/"

and upload/attach the output with some mechanism to flag low-quality output for review?

Alternatively, humans could QC the pronunciation using the demo site or some custom tool to streamline the copying/pasting and only save/upload those entries that sound reasonable?

atzipperer raised the priority of this task from Lowest to High.EditedOct 19 2017, 6:34 PM
atzipperer subscribed.

Encyclopedias arbitrate disputes. Pronunciation disputes are rampant. Now, our favorite encyclopedia is not capable of resolving these disputes. Please enhance our favorite encyclopedia with this capability.

  1. Many people don't know how to read IPA and also want to know how to pronounce words using IPA transcriptions as a guide.
  2. An individual could increase his/her expertise by using a tool like this to confirm pronunciation.
  3. This could be a tool for teaching how to read IPA.
  4. Cost-benefit analysis: https://itinerarium.github.io/phoneme-synthesis/ exists. So, the cost of the feature is the cost of integrating it into MediaWiki. My personal assessment of the benefit is that it is big.

I can see if I can hack something up if there's consensus that this is worth trying, and has a chance to be added to Wikipedia as a default feature. Is there agreement that automated IPA pronunciation, based on lexconvert + eSpeak as demonstrated above, is beneficial (i.e. good enough that it's better to have it than not to have it)? If so, what's the best way to deliver a proof of concept?

The implementation options are:

  • use the existing JavaScript code (linked in T33221#2915772) which requires minimal effort but requires the client to load 2 MB to play it back (can be cached client-side).
  • use a bot that generates and uploads individual pronunciation files
  • provide an API that generates and caches pronunciations for IPA strings.

I've created a quick hack using the client-side JavaScript (first option listed above). gzip compression brings the JavaScript down to under 500 kB. Using WebAssembly could likely improve it further but that would require more effort.

If you want to try it, add the user script to your user script page, visit a page containing IPA and click the small blue "play" arrow that should show up next to it.

This is really cool!

There's some room for improvement with some of the pronunciation — I tried it out on the classic example of "Nevada", where the middle 'a' didn't quite match the expected 'a as in bad', and I tried "Barack Hussein Obama", where it kind of garbles the end of Hussein — but even as is, it would be a pretty big usability improvement for most users who can't make sense of the IPA markings.

This definitely seems like it would be worth polishing up and intergrating into the default experience, and iterating on.

This would be a useful tool for sanity-checking listed IPA pronunciations (e.g. if the examples @Ragesoss tried out above, messed up because of incorrect transcription into IPA, having a machine reading would let that issue be identified and corrected). However, this could also serve to motivate more people to make and upload actual pronunciation recordings, if they are dissatisfied with the machine pronunciations for some reason (again, Ragesoss's above examples are relevant).

Speaking more speculatively, this could also ultimately serve to inform work on a hypothetical revision or successor to IPA, by exposing and highlighting shortcomings of the extant system (of course, this is where I betray my ignorance of IPA in general).

I'm happy to address the technical part of this, but not the organizational part (finding decision makers, getting a yes/no, deciding in which shape this should be implemented).

The village pump discussion mentioned that another text-to-speech project (Wikispeech) exists, and the relevant decisions have already been made long ago. They're aiming for a server-side version (which will be more work but may also result in a better/faster experience for the user) and a much bigger scope (actually read articles vs. just IPA).

I've built a slightly improved version that runs from a bookmarklet. This solves my personal problem. And while I'd love to make it available for everyone and I'm happy to volunteer my time to do technical work to polish it, unfortunately I don't find the organizational work pleasant nor rewarding, and can't bring myself to spend my free time on it (especially since I don't want to create a conflict with the Wikispeech team).

If someone is able to get an "official" decision that client-side IPA TTS is something that should be included in Wikipedia by default, and some way for me to get answers to questions regarding the desired implementation specifics (e.g. whether this should be an extension/standalone code/..., level of sandboxing, level of compatibility with legacy browsers, ...), I'm happy to do the technical part.

This seems like basic and expected functionality.
I wonder how @alexhollender thinks this might impact the reading experience?

Related: @DLynch shared this IPA conversion tool with me: https://itinerarium.github.io/phoneme-synthesis/

also, it would be really nice if you could play the audio without being re-directed to a different page : )

TheresNoTime changed the task status from Open to In Progress.Jun 16 2022, 11:53 PM