Page MenuHomePhabricator

☂ Deploy Wikispeech on beta cluster
Open, Needs TriagePublic

Description

Deploy the Wikispeech tool on the beta cluster for testing and evaluation. The tool consists of a TTS backend service (Speechoid) and a MediaWiki extension.

After testing on the beta cluster the intention is to have the extension deployed as a beta feature on the ar, en and sv Wikipedias. These are the languages currently supported.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
InvalidNone
ResolvedLokal_Profil
DeclinedNone
ResolvedLokal_Profil
OpenNone
OpenNone
Resolved kalle
OpenNone
Resolved kalle
ResolvedNone
Resolved kalle
ResolvedNone
Resolved kalle
ResolvedJopparn
Resolved kalle
Resolved kalle
Resolved kalle
Resolved kalle
Resolved kalle
ResolvedNone
Resolved kalle
OpenNone
Resolved kalle
Resolved kalle
Resolved kalle
Resolved kalle
Resolved kalle
ResolvedSebastian_Berlin-WMSE
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

It’s unclear if this service needs to be deployed to production before the extension is deployed on the beta cluster or if this is only needed before the extension gets deployed to production. Should we create a deployment subtask for the service already now?

It’s unclear if this service needs to be deployed to production before the extension is deployed on the beta cluster or if this is only needed before the extension gets deployed to production. Should we create a deployment subtask for the service already now?

Probably. You're going to have to write puppet code/modules to actually get it deployable on beta (possibly falls to Release-Engineering-Team to decide/confirm whether we can make the usage on beta depend on the tool in tool labs for definite) and this is definitely the case on production

Also, I'm not sure having something like http://wikispeech-tts.wmflabs.org/ in production, web externally accessible is going to happen.. You do need to speak to SRE about a deployment strategy... If it's not packaged for debian, and the current installation instructions is wget-ing various jars from the internet...

https://www.mediawiki.org/wiki/Extension:Wikispeech#Install_TTS_server

# In ''mishkal/tashkeel/tashkeel.py, c''hange line 385 from:
#* <code>vocalized_text = u" ".join([vocalized_text, self.display(word, format_display)])</code>
#* to:
#* <code>vocalized_text = u" ".join([vocalized_text, self.display(voc_word, format_display)])</code>

*definitely* needs fixing upstream. https://github.com/linuxscout/mishkal/issues/17 has been open since April doesn't inspire me with much confidence

# In ''mishkal/tashkeel/tashkeel.py, c''hange line 385 from:
#* <code>vocalized_text = u" ".join([vocalized_text, self.display(word, format_display)])</code>
#* to:
#* <code>vocalized_text = u" ".join([vocalized_text, self.display(voc_word, format_display)])</code>

*definitely* needs fixing upstream. https://github.com/linuxscout/mishkal/issues/17 has been open since April doesn't inspire me with much confidence

https://github.com/linuxscout/mishkal/pull/19

And looking at https://www.mediawiki.org/wiki/Extension:Wikispeech#Make_audio_files_accessible

No, we're not going to be writing a directory like this to share the files... It doesn't scale...

Depending on how it works, they should probably be going into Swift like transcodes etc do

And https://www.mediawiki.org/wiki/Extension:Wikispeech#Start_processes_in_screen isn't going to fly in production... Things aren't going to be manually started in a screen session

Hi there!

Thanks to @Reedy for doing a quick first pass review of this. It seems this is going to need a fair amount of re-architecting to get to a place where it would be available on either Beta Cluster or production.

I see that this is associated with WMSE; does WMSE have funding for doing the needed work (re-architecture etc)? Looks like there will need to be some time for someone from the TechCom to help out?

This comment was removed by greg.

Thanks for the feedback.

It’s unclear if this service needs to be deployed to production before the extension is deployed on the beta cluster or if this is only needed before the extension gets deployed to production. Should we create a deployment subtask for the service already now?

Probably. You're going to have to write puppet code/modules to actually get it deployable on beta (possibly falls to Release-Engineering-Team to decide/confirm whether we can make the usage on beta depend on the tool in tool labs for definite) and this is definitely the case on production

Creating a puppet for the server has stared in T151877. It would be good to know if this is a blocker for the review process, in which case we need to prioritize it.

[...] and the current installation instructions is wget-ing various jars from the internet...

# In ''mishkal/tashkeel/tashkeel.py, c''hange line 385 from:
#* <code>vocalized_text = u" ".join([vocalized_text, self.display(word, format_display)])</code>
#* to:
#* <code>vocalized_text = u" ".join([vocalized_text, self.display(voc_word, format_display)])</code>

*definitely* needs fixing upstream. https://github.com/linuxscout/mishkal/issues/17 has been open since April doesn't inspire me with much confidence

These things have been fixed, but the extension page wasn't up to date. I removed the old work arounds and added a link to the installation instructions.

And https://www.mediawiki.org/wiki/Extension:Wikispeech#Start_processes_in_screen isn't going to fly in production... Things aren't going to be manually started in a screen session

This was intended for running the server in a development environment. Added a comment about this in the documentation.

Hi there!

Thanks to @Reedy for doing a quick first pass review of this. It seems this is going to need a fair amount of re-architecting to get to a place where it would be available on either Beta Cluster or production.

I see that this is associated with WMSE; does WMSE have funding for doing the needed work (re-architecture etc)? Looks like there will need to be some time for someone from the TechCom to help out?

Hi @greg! Yes, we will continue to work on the extension also next year. So your feedback is much appreciated. @brion from TechCom has kindly offered us help as well.

Just a general update from the WMSE team. We have only been doing minor work on Wikispeech so far this year as we've no funding for the project. We have now secured some funding for a continuation project so we should be able to work on this again after summer.

Not that we are only a small team working on this (mainly me and @Sebastian_Berlin-WMSE ) and that we are both also involved in other projects at WMSE which demand our time. As a result the pace by which things gets done is not always what we would wish for and it is sensitive to deadlines in the other projects. That said Wikispeech is something WMSE is dedicated to continue so despite the at times slow pace development does continue.

Sebastian_Berlin-WMSE renamed this task from Deploy extension Wikispeech on beta cluster to ☂ Deploy Wikispeech on beta cluster.Oct 6 2020, 1:45 PM
Sebastian_Berlin-WMSE updated the task description. (Show Details)

Hello again. Some years ago we worked implementing the Basque TTS into this extension. Is it included in the current version for the beta cluster? Thanks!

Hello again. Some years ago we worked implementing the Basque TTS into this extension. Is it included in the current version for the beta cluster? Thanks!

Hi. Basque is unfortunately not included in what we are hoping to deploy to the beta cluster right now. Please see https://www.mediawiki.org/wiki/Extension_talk:Wikispeech#Basque_language for more details.