It seems as the output is a Microsoft PCM WAV file?
kalle@musa:/tmp$ file foo.opus foo.opus: RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz
Where file foo is the Base64 decoded audio_data from http://wikispeech-tts-dev.wmflabs.org/?lang=en&input=A+test