Page MenuHomePhabricator

Blip followed by low volume in generated audio
Open, Needs TriagePublic

Description

Sometimes, the generated audio contains a high pitched blip and the rest has very low volume.

Example:
/wikispeech/?lang=ar&input_type=text&input=والكثير+من+أعماله+الأكثر+شهرة+كانت+من+الستينات+،+وكلمات+أغانيه+فيها+من+الحكمة+والاحتجاج+الشئ+الكثير+لانه+كان+من+الطبقة+العاملة+والمضطهدة+بأمريكا،+كما+أن+تم+استخدام+بعض+أغانية+كنشيد+لحركة+الحقوق+المدنية+للأفارقة+الأمريكيون+والحركة+المناهضة+لحرب+فيتنام،+تميز+بغناءه+العديد+من+الأنواع+الموسيقية+مثل+ريف+والغوسبل+والبلوز+والروك،+وأدرجت+كلمات+ديلان+مجموعة+متنوعة+من+التأثيرات+السياسية+والاجتماعية+والفلسفية+والأدبية.

Event Timeline

This is apparently not as deterministic as I first thought (it tends to happen when I try having the page read on the demo wiki). I tried a few times now, with the same input, and this issue occurred only some of the times.

Try running it multiple times and see if you can reproduce it.

HannaLindgren added a comment.EditedOct 4 2017, 10:53 AM

OK. It's just that it's quite long so it's a lot of waiting to listen for the beep and low volume (potentially) occurring.
If you happen to run into something shorter with the same issues, please add this link as well! Maybe something in English or Swedish would be easier to use for testing.
You don't have to actively look for it, but if you find something shorter and non-Arabic, please add this as a second example.

HannaLindgren added a comment.EditedOct 6 2017, 10:38 AM

I found another one with varying volume:

  1. go to the wikispeech test page: https://wikispeech.morf.se/test.html
  2. click "Speak" to the right of the text "En mening. En mening till i samma stycke. Vissa meningar låter bara "klonk"."

https://wikispeech.morf.se/?lang=sv&input=En%20mening.%20En%20mening%20till%20i%20samma%20stycke.%20Vissa%20meningar%20l%C3%A5ter%20bara%20%22klonk%22

I stumbled on a not as severe case of this in English:

?lang=en&input=Demo, en

There is some kind of artifact(?) in the beginning and the rest of the sound is a bit lower, cf.

?lang=en&input=not a demo, en

which has a normal volume throughout.

Also this one:

?lang=en&input=Please note that the words need to be added manually and it may take some time before the lexicon is updated.
HannaLindgren added a comment.EditedDec 11 2017, 2:14 PM

This seems to be related to the HTS implementation in MaryTTS:

  1. there is indeed a spike in some generated audio files
  2. this causes MaryTTS to lower the volume (which we should be happy about), but the volume is not increased again after the spike

It could have to do with different versions for building and using the voice. We will not be able to fix this for the Dec 2017 release, but we will include information about this in the release notes.

To start with, we will investigate

  1. whether this happens to all Wikispeech languages, or just some (English?)
  2. if there are reported issues/reports for MaryTTS about this issue
HannaLindgren renamed this task from Blip followed by low volume in generated audio to [SAVED FOR LATER] Blip followed by low volume in generated audio.Dec 13 2017, 1:19 PM
HannaLindgren renamed this task from [SAVED FOR LATER] Blip followed by low volume in generated audio to Blip followed by low volume in generated audio.Oct 23 2019, 9:07 AM

The reason for the "spike" is still unknown. But the low volume after is caused by the "Amplitude Normaliser" in Marytts.

The English example seems to work right now, so possibly there is an issue only with the voices we have built ourselves.

A (maybe) related issue is discussed at https://github.com/marytts/marytts/issues/904. At least it's related in the sense that it has to do with how the marytts port of hts works.

A comment from one of the developers is:

"This is unfortunately a known issue with the Java port of the HTS Engine code... I've never been able to reliably isolate what causes it, but it seems to be more of an issue with some voices than others. In MaryTTS 6.x, we're actually moving to a wrapper library for HTS. This is still work in progress, but we expect to have a working version published very soon."

I think there is reason not to try to fix this problem in the old marytts code, but to wait for a new marytts version or use a different system altogether.

Lokal_Profil moved this task from Backlog to Blocked on the Wikispeech-STTS board.