This can be approached in different ways
- Pre generate audio files.
- May require a lot of disk space, can be mitigated by reducing sound quality.
- Related to T122160.
- Run TTS-server locally
- Requires more processing power.
- Shouldn't need much in the terms of extra implementation to run; the server is accessed as usual, only on local network.