Use free software implementation for Phonos on Wikimedia sites
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Legoktm
	Sep 8 2022, 4:12 AM

Description

The current plan for Phonos is to use a proprietary Google API when being deployed to Wikimedia sites. There are some free software options for the same functionality, but are supposedly not up to par. This task outlines and tracks those deficiencies, so we can file upstream bugs and switch when a free option can replace Google.

An analysis was done in T307624: [16 hours] Investigate: Options of TTS engines with some notes on https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Reading/IPA_audio_renderer/TTS_investigation

Related Objects

Mentioned In: T336763: Enable PhonosInlineAudioPlayerMode on all projects
T343779: [Newcomer track] Machine learning for Wikipedia
T317431: Larynx expects IPA characters to be separated by a space
T317319: Request creation of text-to-speech VPS project
T317227: Phonos should have its own shellbox deployment for production
Mentioned Here: T317227: Phonos should have its own shellbox deployment for production
T307624: [16 hours] Investigate: Options of TTS engines

Event Timeline

Legoktm created this task.Sep 8 2022, 4:12 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 8 2022, 4:12 AM

In T317227#8219097, @MusikAnimal wrote:

In T317227#8218749, @Legoktm wrote:

[offtopic] Is there a separate task we can track the reliance on a proprietary service instead of using a free software solution?

T307624: [16 hours] Investigate: Options of TTS engines. Some of us (myself included) were passionate about pushing for FOSS but from the options we found, Google performed extraordinarily better and supports many more languages. The Language team who will eventually takeover ownership of Phonos also seemed more keen to use Google than having to maintain/update third-party code. At any rate, Phonos was designed in such a way that we can add more engines if we find one better than Larynx and eSpeak, which we already support. For now, I believe the decision has been made that we're going with Google.

To be clear, I'm not asking that the decision be reconsidered now, rather I would like actionable feedback that we can file upstream and figure out what resources can be used to help build an acceptable free software replacement.

I reviewed that task and Meta-Wiki page (linked in main task description) but didn't feel it had enough detail that I could point an upstream at it and ask what it would take to get those issues fixed.

On the issue of language support, for ContentTranslation AIUI we use the free apertium implementation where it's high quality, and only use proprietary services when it's not usable. Depending on what wikis this is initially targeting, maybe that's an option?

In T317227#8219166, @TheresNoTime wrote:

Remaining [offtopic] here, but I'd quite like to see if we can maintain https://larynx-tts.wmcloud.org/openapi — perhaps I'll request a WMCS project for it and chuck a load of compute power at a VM and see how the maximum settings sound (as the generation time at that level tends to be fairly long on even moderately spec'd VMs)

Legoktm mentioned this in T317227: Phonos should have its own shellbox deployment for production.Sep 8 2022, 4:19 AM

Legoktm added a subscriber: Peachey88.Sep 8 2022, 4:23 AM

TheresNoTime mentioned this in T317319: Request creation of text-to-speech VPS project.Sep 8 2022, 2:38 PM

• NRodriguez moved this task from Backlog to Tracking 🌱 on the MediaWiki-extensions-Phonos board.Sep 8 2022, 5:30 PM

Hello there, thanks for writing this and for asking for more details! Agree with you that our preferred route would have definitely been open source. We definitely tried to make it the case, as @MusikAnimal and @TheresNoTime noted above!

I reviewed that task and Meta-Wiki page (linked in main task description) but didn't feel it had enough detail that I could point an upstream at it and ask what it would take to get those issues fixed.

Mind sharing more details about what kind of details would be most helpful? Here is what I see as the most imperative par functionality from an open source engine if we were to go the larynx/other open source route:

Supports as many language options
Had an accessible learning curve/ barrier to contributing it for open source devs
Was as reliable as the option we selected
Sounded as close to human as as the option we selected

Depending on what wikis this is initially targeting, maybe that's an option?

The scope of the project is to benefit all wikis that use IPA and are public

Happy to answer more! Will move this to Tracking instead of Backlog since I know CommTech engineers + myself are still passionate about considering this route

TheresNoTime mentioned this in T317431: Larynx expects IPA characters to be separated by a space.Sep 9 2022, 5:48 PM

Nardog subscribed.Nov 10 2022, 1:13 AM

Pigsonthewing subscribed.Feb 26 2023, 3:27 PM

Restricted Application added a project: Community-Tech. · View Herald TranscriptFeb 26 2023, 3:27 PM

• JMcLeod_WMF moved this task from New & TBD Tickets to Following on the Community-Tech board.Feb 27 2023, 3:28 PM

KSiebert removed a project: Community-Tech.Jul 5 2023, 2:30 PM

• santhosh mentioned this in T343779: [Newcomer track] Machine learning for Wikipedia.Aug 9 2023, 11:18 AM

MusikAnimal mentioned this in T336763: Enable PhonosInlineAudioPlayerMode on all projects.Sep 6 2023, 8:29 PM

R4356th subscribed.Sep 7 2023, 5:08 AM

For the people developing this at Language-Team... please do not let it be just about IPA.

For languages like Georgian, Hausa, Polish, and Spanish (and even Mandarin, which is written with thousands of characters but each one has a consistent pronunciation), the pronunciation is completely predictable from orthography, so speakers don't need IPA to tell how a word is pronounced—they can just look at the spelling and know. (Or in cases like Russian and Slovene, they have an auxiliary system to fill in the details about the pronunciation the orthography does not normally cover.)

Forcing users to input IPA just to get text spoken when computers are just as capable of deriving the same audio from the orthographic form would privilege those who can afford to learn it and would be highly inequitable. Please avoid it at all costs. Orthography-to-audio has more demand anyway.

Remagoxer subscribed.Oct 8 2023, 4:59 PM

Use free software implementation for Phonos on Wikimedia sitesOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Use free software implementation for Phonos on Wikimedia sites
Open, Needs TriagePublic
Actions