Audio pronunciation: Automatic text-to-speech to convert IPA to sound
Open, In Progress, LowPublic
Actions

Assigned To

None

Authored By

	• bzimport
	Sep 28 2011, 6:58 PM

Description

Suggestion: An audio pronunciation feature for entries, similar to that found on <dictionary.reference.com>.
But using the existing IPA in each page (in Wiktionary, and some in Wikipedia and elsewhere) to auto-generate a sound file, instead of waiting for humans to manually record a file for every pronunciation of every word.

See also:

T2224: IPA or SAMPA module - a merged duplicate with some useful discussion
PronunciationRecording - a different project, that aims to make it easier for humans to record and upload individual sound files.

Details

Reference: bz31221

Related Objects

Mentioned In: T330659: Officially support languages the TTS does not support IPA for
T298950: Audio pronunciation of IPA transcriptions
T28207: Split magic linking out of core; create new magic linking extension
T97761: Participation of WMF Reading at Wikimania 2016 and Wikimedia Hackathon 2016
T48610: [DO NOT USE] Pronunciation recording tool (tracking) [superseded by #PronunciationRecording]
Mentioned Here: T229169: Support inline links to audio with ability to play it in situ
T298950: Audio pronunciation of IPA transcriptions
T48610: [DO NOT USE] Pronunciation recording tool (tracking) [superseded by #PronunciationRecording]
T2224: IPA or SAMPA module

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

wmf.amgine3691 wrote:

This could be a reasonably classic tag extension, <text2speech type="IPA" icon="speaker">/a.zyʁ/</text2speech>

It would also be a cool feature on the Wiktionary Mobile App, which uses the speaker icon as a button to play pronunciation files for articles which have them.

(In reply to comment #4)

The current system
doesn't work because uploading individual files of individual words is a pain
in the ass, I think.

I haven't tried (have you?) but it shouldn't be. Nowadays plenty of people has a decent quality mic connected to the Internet.

the voice technology that came with computers fifteen years ago is "good
enough" or can be wrestled to be.

In English and a few other languages sure, but we have close to 200 Wiktionaries and most of those languages probably won't benefit from that tech research anytime soon.

For a reference, see

http://www.loquendo.com/en/products/text-to-speech/languages-voices/
http://www2.research.att.com/~ttsweb/tts/demo.php

Another step will be to wait for open source alternative of these propriatery and nowadays lucrative systems...

But ideally you'd have the ability to turn proper IPA into sound. A smarter
solution is needed. Consider this a brainstorming bug. :-)

Just in case its useful:

http://www.w3.org/TR/speech-synthesis/#edef_phoneme
http://en.wikipedia.org/wiki/Speech_synthesis#Text-to-phoneme_challenges'

So if I'm reading this right, this requires an 'IPA -> Sound' engine of some sort?

wmf.amgine3691 wrote:

(In reply to comment #7)

So if I'm reading this right, this requires an 'IPA -> Sound' engine of some
sort?

There is also a bug requesting the reverse, Sound -> IPA, but for this specific enhancement, yes.

(In reply to comment #8)

There is also a bug requesting the reverse, Sound -> IPA

Where? I couldn't find it.

rahul14m93 wrote:

Many Words in Wikitionary have a pronunciation attached along with it,but there are many words that dont have this feature
Example : http://en.wiktionary.org/wiki/compendium

And i did a quick random survey and i found out that words which are pertaining to a specific field like mathematics,chemistry don't have the pronunciation attached

A solution which i propose is to provide a button to record the sound ,on clicking that button a 5 sec recording starts within which the speaker should be loud, clear and adhere to the phonetics. A Rating feature would also go along well with this feature,to describe it briefly :People can record the pronunciation and volunteers can rate the recordings out of 5(similar to the imdb ones :))

We could have the recordings saved in the ogg or wav format.

wmf.amgine3691 wrote:

(In reply to comment #9)

(In reply to comment #8)

There is also a bug requesting the reverse, Sound -> IPA

Where? I couldn't find it.

Neither can I. It was in a discussion about a mobile tool for recording spoken word and uploading to commons, both for the Wiktionary project and wikisource (for oral history recordings needing transcriptions.)

(In reply to comment #6)

(In reply to comment #4)

The current system
doesn't work because uploading individual files of individual words is a pain
in the ass, I think.

I haven't tried (have you?) but it shouldn't be.

That's what requested here, in fact. :)

Nowadays plenty of people
has
a decent quality mic connected to the Internet.

Which is why exploiting this resource is a good project.

(In reply to comment #7)

So if I'm reading this right, this requires an 'IPA -> Sound' engine of some
sort?

Not what comment 0 asked, but some proposed it; clarifying summary.

(In reply to comment #11)

Neither can I. It was in a discussion about a mobile tool for recording
spoken
word and uploading to commons, both for the Wiktionary project and wikisource

Indeed, see URL where there's clear interest from the communities.
It's still not clear, from a Wikimedia projects point of view, if the aim is best served by an extension or other system, but the request is legit.

Adding Lars, who proposed the voice recording tool idea at:

http://thread.gmane.org/gmane.org.wikimedia.wiktionary/1265

And some context: Rahul - see comment 10 - is interested in this project for Google Summer of Code. Having a community need declared increases points for him. If someone would volunteer as mentor then his chances would increase even more (hint, hint).

rahul14m93 wrote:

(In reply to comment #13)

I would like someone to reply to comment #10

Quim Gil-Thanks,Had to think alot ,as i told you!I'd be glad to work for this project

(In reply to comment #13)

Adding Lars, who proposed the voice recording tool idea at:

http://thread.gmane.org/gmane.org.wikimedia.wiktionary/1265

And some context: Rahul - see comment 10 - is interested in this project for
Google Summer of Code. Having a community need declared increases points for
him. If someone would volunteer as mentor then his chances would increase
even
more (hint, hint).

As an aside I think that should be discussed in a separate bug. Getting humans to record sound and getting auto tts of ipa (what comment 0 is asking for) is rather different.

(In reply to comment #15)

As an aside I think that should be discussed in a separate bug. Getting
humans
to record sound and getting auto tts of ipa (what comment 0 is asking for) is
rather different.

Agreed, the original report requested text-to-speech. Easier pronunciation recording is a whole different thing, so I've filed bug 46610 and retargeted this to the original request.

As a side note, I agree computer pronunciation is inferior to a human recording. The question is whether it's enough better than nothing to be worth implementing.

rahul14m93 wrote:

I have prepared a rough project proposal Please do give me your feedback and suggestions so that i can improve on it https://www.mediawiki.org/wiki/User:Rahul21/Gsoc

Rahul, your proposal is related to

Bug 46610 - Pronunciation recording tool

Please announce it there. Thank you!

rahul14m93 wrote:

I am sorry 2 tabs opened at the same,causing some confusion!

One year of silence. Setting to Lowest only to reflect the current reality, which is that nobody we are aware of is working or planning to work on this.

• brooke subscribed.Jul 21 2015, 1:09 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 21 2015, 1:09 AM

Ainali subscribed.Oct 29 2015, 2:26 PM

Is this a duplicate to T2224? They seem to want to achieve the same thing, but are expressed in different ways.

Qgil added projects: All-and-every-Wiktionary, Multimedia.Oct 29 2015, 5:52 PM

Restricted Application added a subscriber: Matanya. · View Herald TranscriptOct 29 2015, 5:52 PM

Jdforrester-WMF moved this task from Untriaged to Backlog on the Multimedia board.Nov 5 2015, 6:25 PM

Aklapper merged a task: T2224: IPA or SAMPA module.Nov 29 2015, 6:09 PM

Aklapper added subscribers: Meno25, Amire80, mxn and 3 others.

Tacsipacsi mentioned this in T48610: [DO NOT USE] Pronunciation recording tool (tracking) [superseded by #PronunciationRecording].Feb 6 2016, 10:48 PM

Tacsipacsi updated the task description. (Show Details)

Tacsipacsi set Security to None.

Meno25 unsubscribed.Feb 8 2016, 7:45 PM

Qgil unsubscribed.Feb 11 2016, 12:29 PM

Qgil mentioned this in T97761: Participation of WMF Reading at Wikimania 2016 and Wikimedia Hackathon 2016.May 12 2016, 10:09 AM

Krinkle mentioned this in T28207: Split magic linking out of core; create new magic linking extension.May 26 2016, 6:33 PM

A) Awesome! I requested this in 2010! :D
B) There were some concerns raised at the time, about issues with dialect variance. See a brief explanation in https://en.wikipedia.org/wiki/Help_talk:IPA/Archive_2#Embedded_IPA_pronunciation_soundfiles
I imagine the rabbit hole goes deeper though, but I have no expertise in this area.

Sidenote, not already mentioned above, so see also https://en.wikipedia.org/wiki/IPA_pulmonic_consonant_chart_with_audio and https://en.wikipedia.org/wiki/IPA_vowel_chart_with_audio

• Mattflaschen-WMF edited subscribers, added: Mattflaschen-Personal; removed: • Mattflaschen-WMF.Jun 7 2016, 9:44 PM

Nemo_bis unsubscribed.Jun 8 2016, 12:34 PM

jrbs subscribed.Jun 23 2016, 8:16 AM

• Niedzielski subscribed.Jun 23 2016, 9:26 AM

Converting IPA to speech using lexconvert's correspondences to eSpeak phonemes and eSpeak seems to yield reasonable results (at least for the sampled English Wikipedia entries that have IPA).

To try, paste the IPA into the text box at https://itinerarium.github.io/phoneme-synthesis/.

Perhaps, some scripted process can run

python lexconvert.py --try unicode-ipa "/mʊmˈbaɪ/"

and upload/attach the output with some mechanism to flag low-quality output for review?

Alternatively, humans could QC the pronunciation using the demo site or some custom tool to streamline the copying/pasting and only save/upload those entries that sound reasonable?

waldyrious subscribed.Feb 28 2017, 4:36 PM

TheDragonFire subscribed.Apr 14 2017, 2:06 PM

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 7:05 PM

Encyclopedias arbitrate disputes. Pronunciation disputes are rampant. Now, our favorite encyclopedia is not capable of resolving these disputes. Please enhance our favorite encyclopedia with this capability.

Many people don't know how to read IPA and also want to know how to pronounce words using IPA transcriptions as a guide.
An individual could increase his/her expertise by using a tool like this to confirm pronunciation.
This could be a tool for teaching how to read IPA.
Cost-benefit analysis: https://itinerarium.github.io/phoneme-synthesis/ exists. So, the cost of the feature is the cost of integrating it into MediaWiki. My personal assessment of the benefit is that it is big.

Per https://www.mediawiki.org/wiki/Phabricator/Help#Setting_task_priority

Harej moved this task from Incoming to Confirmed Extension Requests on the MediaWiki-extension-requests board.Jan 30 2018, 11:22 PM

I can see if I can hack something up if there's consensus that this is worth trying, and has a chance to be added to Wikipedia as a default feature. Is there agreement that automated IPA pronunciation, based on lexconvert + eSpeak as demonstrated above, is beneficial (i.e. good enough that it's better to have it than not to have it)? If so, what's the best way to deliver a proof of concept?

The implementation options are:

use the existing JavaScript code (linked in T33221#2915772) which requires minimal effort but requires the client to load 2 MB to play it back (can be cached client-side).
use a bot that generates and uploads individual pronunciation files
provide an API that generates and caches pronunciations for IPA strings.

Quiddity updated the task description. (Show Details)Oct 2 2018, 6:35 PM

Quiddity removed a subscriber: • wikibugs-l-list.

I've created a quick hack using the client-side JavaScript (first option listed above). gzip compression brings the JavaScript down to under 500 kB. Using WebAssembly could likely improve it further but that would require more effort.

If you want to try it, add the user script to your user script page, visit a page containing IPA and click the small blue "play" arrow that should show up next to it.

AfroThundr3007730 subscribed.Oct 5 2018, 2:27 AM

This is really cool!

There's some room for improvement with some of the pronunciation — I tried it out on the classic example of "Nevada", where the middle 'a' didn't quite match the expected 'a as in bad', and I tried "Barack Hussein Obama", where it kind of garbles the end of Hussein — but even as is, it would be a pretty big usability improvement for most users who can't make sense of the IPA markings.

This definitely seems like it would be worth polishing up and intergrating into the default experience, and iterating on.

This would be a useful tool for sanity-checking listed IPA pronunciations (e.g. if the examples @Ragesoss tried out above, messed up because of incorrect transcription into IPA, having a machine reading would let that issue be identified and corrected). However, this could also serve to motivate more people to make and upload actual pronunciation recordings, if they are dissatisfied with the machine pronunciations for some reason (again, Ragesoss's above examples are relevant).

Speaking more speculatively, this could also ultimately serve to inform work on a hypothetical revision or successor to IPA, by exposing and highlighting shortcomings of the extant system (of course, this is where I betray my ignorance of IPA in general).

I'm happy to address the technical part of this, but not the organizational part (finding decision makers, getting a yes/no, deciding in which shape this should be implemented).

The village pump discussion mentioned that another text-to-speech project (Wikispeech) exists, and the relevant decisions have already been made long ago. They're aiming for a server-side version (which will be more work but may also result in a better/faster experience for the user) and a much bigger scope (actually read articles vs. just IPA).

I've built a slightly improved version that runs from a bookmarklet. This solves my personal problem. And while I'd love to make it available for everyone and I'm happy to volunteer my time to do technical work to polish it, unfortunately I don't find the organizational work pleasant nor rewarding, and can't bring myself to spend my free time on it (especially since I don't want to create a conflict with the Wikispeech team).

If someone is able to get an "official" decision that client-side IPA TTS is something that should be included in Wikipedia by default, and some way for me to get answers to questions regarding the desired implementation specifics (e.g. whether this should be an extension/standalone code/..., level of sandboxing, level of compatibility with legacy browsers, ...), I'm happy to do the technical part.

• iamjessklein awarded a token.Jun 4 2019, 1:47 PM

This seems like basic and expected functionality.
I wonder how @alexhollender thinks this might impact the reading experience?

Related: @DLynch shared this IPA conversion tool with me: https://itinerarium.github.io/phoneme-synthesis/

Quiddity mentioned this in T298950: Audio pronunciation of IPA transcriptions.Feb 10 2022, 8:44 AM

Nardog subscribed.Feb 10 2022, 9:33 AM

Pigsonthewing subscribed.Feb 10 2022, 10:41 AM

Pigsonthewing unsubscribed.

Pigsonthewing subscribed.

Possible merge in from T298950

Aklapper merged a task: T298950: Audio pronunciation of IPA transcriptions.Feb 15 2022, 10:14 PM

Aklapper added subscribers: Yug, Sdkb.

https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Reading/IPA_audio_renderer

Zblace subscribed.Feb 19 2022, 9:33 AM

also, it would be really nice if you could play the audio without being re-directed to a different page : )

@alexhollender_WMF That would be T229169: Support inline links to audio with ability to play it in situ.

MarkAHershberger unsubscribed.Feb 23 2022, 9:13 PM

• NRodriguez subscribed.Apr 12 2022, 9:32 PM

• NRodriguez unsubscribed.

• NRodriguez subscribed.

Aklapper edited projects, added MediaWiki-extensions-Phonos; removed Multimedia.May 4 2022, 11:10 PM

Aklapper removed a subscriber: • bzimport.

TheresNoTime subscribed.May 27 2022, 9:00 AM

TheresNoTime changed the task status from Open to In Progress.Jun 16 2022, 11:53 PM

TheresNoTime moved this task from Backlog to 🌟Top Priority on the Community-Wishlist-Survey-2022 board.Aug 30 2022, 4:20 PM

Aklapper moved this task from 🌟Top Priority to Backlog on the Community-Wishlist-Survey-2022 board.Aug 30 2022, 6:45 PM

• NRodriguez moved this task from Backlog to Tracking 🌱 on the MediaWiki-extensions-Phonos board.Sep 8 2022, 5:36 PM

CX_Zoom subscribed.Dec 4 2022, 9:32 PM

R4356th subscribed.May 21 2023, 6:30 PM

Restricted Application added a project: Community-Tech. · View Herald TranscriptMay 21 2023, 6:30 PM

TheresNoTime moved this task from New & TBD Tickets to Following on the Community-Tech board.Jun 13 2023, 1:05 PM

Pigsonthewing mentioned this in T330659: Officially support languages the TTS does not support IPA for.Jun 28 2023, 6:53 PM

KSiebert removed projects: Community-Tech, Community-Wishlist-Survey-2022.Jul 5 2023, 2:30 PM

	F8292: speaker.png
	Feb 6 2016, 10:49 PM

Audio pronunciation: Automatic text-to-speech to convert IPA to soundOpen, In Progress, LowPublicActions

Description

Details

Related Objects

Event Timeline

Audio pronunciation: Automatic text-to-speech to convert IPA to sound
Open, In Progress, LowPublic
Actions