Per T180015#3835408 deployment on wmf infrastructure will likely require audio files to be going into Swift rather than being stored in a directory (see wikispeech_mockup readme).
Ideally the TTS server should have a configurable file storage backend so that swift could be used if available.