It would be desirable if any request which has been synthesized would also be cached so that subsequent identical requests don't need to be resynthesized.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T152430 Run Wikispeech offline | |||
Resolved | HaraldBerthelsen | T143644 Multiple requests to TTS server should not cause delay | |||
Resolved | Jopparn | T151786 Publically accessible demo (player) [Stage 1+2] | |||
Declined | None | T151880 Caching on TTS server |
Event Timeline
Caching currently works as expected in the browser: If the page isn't reloaded the audio elements are still there and play without resynthesising. Fair enough. But this issue refers more to caching either the generated sound files for an entire wikipedia article, or the sound file for a unique sentence. This could certainly be done, based on the id of the article or the text of the sentence, but it will be a bit problematic to know when to resynthesise, if the lexicon or the markup or the textprocessing or the synthesis has been changed in any way.
A note from our previous discussions:
One solution is that the cache could either be time limited in such a way that resynthesis would happen regularly enough anyway.
The other is that you can somehow delete any pronunciations from the cache which contain a given word if that is changed in the lexicon (or some other part of the server side logic which would affect it). New markup on the wiki side would send a different request to the server so should not match the cached result anyway.
using simple key-value store, we can cache responses on server side too,
TTS result would be store in file and indexed according to the page and text coordinates + utterance itself.
The parameters for the synthesis (like the ones in the in the HTTP requests) could be used as keys. This would avoid generating identical audio when utterance string, language, voice etc. are the same.
Yes, but also the "page" is going to indicate language but not voice,
- language
- voice
- utterance
That could be sufficient. And maybe:
- page
- text coordinates on page
Simple Redis, Memcached could be used to store indexes and paths to the files.
I have a feeling that some kind of caching will be required when Wikispeech goes live. @HannaLindgren, @HaraldBerthelsen, @NikolajLindberg: is there any caching on the server currently?