User Story
We want to be able to understand the capabilities of different text-to-speech engines so that we can design a solution for an IPA audio renderer.
Open source:
- https://github.com/itinerarium/phoneme-synthesis/
- https://github.com/rhasspy/larynx
- https://github.com/espeak-ng/espeak-ng
Closed source:
- Google - https://cloud.google.com/text-to-speech
- IBM - https://www.ibm.com/demos/live/tts-demo/self-service/home
- Microsoft - https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/
Acceptance Criteria
- Understand the input and output of the engine.
- Determine if we need to write something or is it just plugin and play.
- Determine if engine supports Speech Synthesis Markup Language (SSML)
Outcome of this ticket
Create a table that lists out the following facts for each of our options:
- How many languages does it support and which languages?
- If it is closed source, how much does it cost?
- Use the corpus that we have created, and record the audio output of the corpus for that library
- How many voices does this library have? Is it only one?
Results
| Status | State | Link |
| β Done | Create table with above information | Community Wishlist Survey 2022/Reading/IPA audio renderer/TTS investigation on Meta |
| β Done | Collate all TTS engines' output for the corpus that we have created | https://tnt-dev.toolforge.org/projects/tts |
| π WIP | Choose one open source and one closed source TTS engine for further comparison | π |