Security review for extension Wikispeech
Open, LowPublic
Actions

Description

Project Information

Name of tool/project: Wikispeech
Project home page: https://www.mediawiki.org/wiki/Extension:Wikispeech
Name of team requesting review: WMSE
Primary contact: @Sebastian_Berlin-WMSE
Target date for deployment (on beta cluster): The sooner the better
Link to code repository / patchset: https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikispeech/+/refs/heads/REW_0.1.8
Programming Language(s) Used: PHP, Javascript

Description of the tool/project

Allows automatic reading aloud of article content, for the supported languages (ar, en, sv).

Description of how the tool will be used at WMF

Presently the goal is to push it to the beta cluster for testing and exploration, then we'll hopefully push it (as a beta feature) to ar., en., sv.wikipedia.

Dependencies

External TTS service, currently running at http://wikispeech-tts.wmflabs.org/.
Gerrit repos: mediawiki/services/wikispeech/*.

Mishkal is a Python 2.7-service. Python 3.8 version is available, but only 3.7 is supported by the WMF base Docker images. Will probably have to wait for the release of Debian Bullseye or build our own Docker image without using Blubber in order to get passed this.

Has this project been reviewed before?

CR internally by @Lokal_Profil, @Sebastian_Berlin-WMSE and @kalle .

Working test environment

Demo wiki: https://wikispeech.wmflabs.org/
Installation instructions: Use vagrant puppet role wikispeech.

Post-deployment

WMSE

Related Objects
Search...

Status	Assigned	Task
Open	None	T264842 Deploy Wikispeech in production
Open	None	T180015 ☂ Deploy Wikispeech on beta cluster
Open	None	T180021 Security review for extension Wikispeech
Invalid	None	T193072 TTS server deployment strategy

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Bawolff moved this task from Scheduled to Waiting/Blocked on the deprecated-security-team-reviews board.Mar 13 2018, 8:42 AM

Qgil mentioned this in T183780: Elhuyar - Wikispeech - eu.Mar 14 2018, 8:55 AM

Theklan subscribed.Mar 14 2018, 1:38 PM

@Sebastian_Berlin-WMSE, I was poked by @Theklan at the Wikimedia conference about this.

What's the status of this? Have you looked at the issues?

Thanks :)

Lokal_Profil mentioned this in T192683: add user setting for disabling Wikispeech.Apr 20 2018, 7:04 PM

Lokal_Profil mentioned this in T182861: Break out lexicon tool code to a separate branch while under development.Apr 25 2018, 8:38 AM

Lokal_Profil mentioned this in T191758: Migrate to new Wikispeech wiki server.Apr 25 2018, 8:40 AM

Hi @Amire80

We had to put Wikispeech on the slow-burner for a while during the start of the year. The focus during that time was on finishing some partially started features and on re-factoring the javascript directory to make things more legible and manageable. We are now picking this up properly again and focusing on the issues mentioned here.

Out of the issues above:

The placeholder special page was removed
The unfinished special page was removed from the master branch (T182861: Break out lexicon tool code to a separate branch while under development). Development will go on in its own branch until such a time as it can be squashed and submitted as complete package.
The user setting for enabling (or disabling if its on by default) is tracked in T192683: add user setting for disabling Wikispeech
The TTS service is getting its own vagrant role as part of T151877: Set up puppet role for TTS server to make testing easier.
- Currently this is done using the provided docker-compose. @Reedy if using a docker-compose (pointing to wmf-hosted images/repos etc) is a no-no for final deployment then give me a shout as we would then have to investigate some other deployment strategy.
The extension page will have it's instructions reviewed as part of T192992: Update installation instructions on mw:Extension:Wikispeech
Swift storage for the audio files (raised in T180015) is tracked in T192990: Use Swift for audio-file storage, this would require our STTS partners to investigate though

The fact is that @ElhuyarFundazioa has all ready to upload, but is waiting to this security check to enable it in Basque Wikipedia. Is it possible to upload the extension and then solve other problems for your own project?

In T180021#4158798, @Theklan wrote:

The fact is that @ElhuyarFundazioa has all ready to upload, but is waiting to this security check to enable it in Basque Wikipedia. Is it possible to upload the extension and then solve other problems for your own project?

(By "upload" @Theklan probably means "deploy".)

And I join his question: Are the issues from the security review addressed? Is it possible to deploy it and get the other things sorted out later?

(Oh, and thanks a lot to @Lokal_Profil for the very detailed reply. I'm just giving a bit of help to @Theklan to get it deployed.)

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

In T180021#4158890, @Amire80 wrote:

Is it possible to deploy it and get the other things sorted out later?

Specifically, this. No, there are things (like Swift) that basically mean it cannot be deployed until that is implemented (even on beta) :)

Reedy mentioned this in T193072: TTS server deployment strategy.Apr 25 2018, 6:52 PM

Theklan changed the status of subtask T193072: TTS server deployment strategy from Open to Stalled.May 19 2018, 8:24 AM

• Vvjjkkii changed the status of subtask T193072: TTS server deployment strategy from Stalled to Open.Jul 1 2018, 1:13 AM

Mainframe98 changed the status of subtask T193072: TTS server deployment strategy from Open to Stalled.Jul 1 2018, 9:56 AM

bd808 subscribed.Aug 30 2018, 5:58 PM

• chasemp moved this task from Waiting/Blocked to Incoming on the deprecated-security-team-reviews board.Dec 17 2018, 6:38 PM

When ready for review, contact Security Team.

Sebastian_Berlin-WMSE edited projects, added Wikispeech-Text-to-Speech; removed Wikispeech.Nov 11 2019, 12:20 PM

Sebastian_Berlin-WMSE moved this task from Unsorted to Monitoring on the Wikispeech-Text-to-Speech board.

Sebastian_Berlin-WMSE mentioned this in T235844: Collect tasks related code and security review.Nov 15 2019, 4:22 PM

Addshore subscribed.Jan 20 2020, 3:52 PM

Restricted Application added a project: Wikispeech-Jobrunner. · View Herald TranscriptJan 20 2020, 3:52 PM

In T180021#4158975, @Reedy wrote:

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

In T180021#5822275, @Addshore wrote:

In T180021#4158975, @Reedy wrote:

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

I don't mind installing a package or two, and some basic config. I do mind having to install various packages, configure them, compiling some stuff etc etc etc. Basically, to your review your stuff, we shouldn't have to jump through excessive hoops to get it running

Noting my comment isn't far off being two years old now, when IIRC, vagrant was mostly the goto dev environment for ease of installing complex stuff. Things have obviously changed since then, to some extent

Many people still use vagrant for stuff (I don't primarily, but I do use it for some stuff). I'm happy enough if there's something "standardised", for example with the intention of that being basically how things get deployed to prod in the end, ala a docker based solution or similar

Sebastian_Berlin-WMSE moved this task from Incoming to Backlog on the Wikispeech-Jobrunner board.Jan 30 2020, 11:38 AM

In T180021#5822754, @Reedy wrote:

In T180021#5822275, @Addshore wrote:

In T180021#4158975, @Reedy wrote:

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

I don't mind installing a package or two, and some basic config. I do mind having to install various packages, configure them, compiling some stuff etc etc etc. Basically, to your review your stuff, we shouldn't have to jump through excessive hoops to get it running

Noting my comment isn't far off being two years old now, when IIRC, vagrant was mostly the goto dev environment for ease of installing complex stuff. Things have obviously changed since then, to some extent

Many people still use vagrant for stuff (I don't primarily, but I do use it for some stuff). I'm happy enough if there's something "standardised", for example with the intention of that being basically how things get deployed to prod in the end, ala a docker based solution or similar

Thanks for the feedback (just realised I never replied here). We have switched over to creating the images using Blubber (largely done but a some things remaining). I'm interpreting the above as the TTS-server (now Speechoid) not needing to be implemented in Vagrant as long as it is fairly easy to set up a service using standard methods.

Lokal_Profil updated the task description. (Show Details)Sep 14 2020, 9:46 PM

Lokal_Profil added a subscriber: • kalle.

In T180021#4860259, @charlotteportero wrote:

When ready for review, contact Security Team.

Ready for review again after having implemented valuable feedback from @Addshore

Restricted Application added a project: secscrum. · View Herald TranscriptSep 17 2020, 6:57 AM

Lokal_Profil added projects: User-LokalProfil, User-kalle, User-Sebastian_Berlin-WMSE.Sep 17 2020, 7:38 AM

• kalle moved this task from Backlog to Proposed for next sprint on the Wikispeech-Jobrunner board.Sep 17 2020, 12:10 PM

• kalle moved this task from Proposed for next sprint to Sprint on the Wikispeech-Jobrunner board.Sep 17 2020, 12:20 PM

• kalle edited projects, added Wikispeech-Jobrunner (Sprint); removed Wikispeech-Jobrunner.

DannyS712 subscribed.Sep 17 2020, 2:03 PM

Sebastian_Berlin-WMSE moved this task from Backlog to Watchin' on the User-Sebastian_Berlin-WMSE board.Sep 22 2020, 7:03 AM

Lokal_Profil moved this task from 📥 Backlog to ⏳ Waiting on the User-LokalProfil board.Sep 22 2020, 8:12 AM

• kalle updated the task description. (Show Details)Sep 22 2020, 10:07 PM

• kalle updated the task description. (Show Details)

Lokal_Profil updated the task description. (Show Details)Sep 23 2020, 9:58 AM

@Lokal_Profil - This still isn't ready for review because there are still too many unanswered questions in the task. Is there any maintenance plan with a group at WMF or a long term roadmap or support plan?

• Jcross triaged this task as Low priority.Sep 23 2020, 4:16 PM

• Jcross moved this task from Incoming to Back Orders on the secscrum board.

• Jcross moved this task from Back Orders to Incoming on the secscrum board.

• Jcross moved this task from Incoming to Back Orders on the secscrum board.

Lokal_Profil closed subtask T193072: TTS server deployment strategy as Invalid.Sep 28 2020, 1:47 PM

In T180021#6487976, @Jcross wrote:

@Lokal_Profil - This still isn't ready for review because there are still too many unanswered questions in the task. Is there any maintenance plan with a group at WMF or a long term roadmap or support plan?

@Jcross I'm not exactly sure which questions you are referring to (note that comments in this task from before 2019 don't really apply anymore due to re-architecturing etc.).

The plan is for WMSE to go on developing and supporting the extension (assuming it gets deployed). In the long run we might be looking for WMF to adopt it but that is not a discussion which has been had yet, and it makes little sense to have before beta deployment and community feedback on the feature.

Note that the underlying service will get a separate Security review ticket (not created yet).

@Lokal_Profil There are a few processes we are unable to find evidence of having been followed, and we'd be happy to provide those if you'd like - but it's largely the issue of there being no path to production that will necessitate us prioritizing this review as "Low". We simply do not have the resources to spend on reviews that do not have support plans already in place. This does not mean that we are declining, but that reviews with a support plan in place will always be worked on first. Please let us know if anything changes and we will reconsider priority.

Resetting old status

@Lokal_Profil Thank you, sorry we missed that.

In T180021#6513072, @Jcross wrote:

@Lokal_Profil There are a few processes we are unable to find evidence of having been followed, and we'd be happy to provide those if you'd like - but it's largely the issue of there being no path to production that will necessitate us prioritizing this review as "Low". We simply do not have the resources to spend on reviews that do not have support plans already in place. This does not mean that we are declining, but that reviews with a support plan in place will always be worked on first. Please let us know if anything changes and we will reconsider priority.

Please highlight any processes you believe that we have missed, any help on that front is welcome.

Re: Path to production. Do you mean a path to production not existing on the WMF side or on out side? If the latter I believe this is largely due to us not having captured that path here on Phabricator. Should largely be rectified via T180015.

Re: Support plan. WMSE has an active grant to develop this extension lasting until 2021-03-31, after this we have a slightly smaller grant allowing us to go on supporting it until 2022-09-30. If it gets deployed and is well received we also have a vested interest in including support of Wikispeech in our APG budget and look for other grants to support it.

• kalle moved this task from 🥴 Backlog to 🤠 This week on the User-kalle board.Oct 8 2020, 9:17 AM

As said above a link to these processes would be great.
The "path to production" for this extension and all involved parts is covered fairly comprehensively at T264842: Deploy Wikispeech in production.
I think the support plan was also covered in the above comment, perhaps that should be added to T264842 to as it covers all services here not just the extension.
If there is some set of tasks within this tree that further block being able to get a prioritized security review it would be good to highlight which ones.
And if the above comment doesn't cover the required "support plan" then having spec of what is expected there would also be great!

In T180021#6532147, @Addshore wrote:

I think the support plan was also covered in the above comment, perhaps that should be added to T264842 to as it covers all services here not just the extension.

Good suggestion. Added it there and rephrased it to clarify that the support is for Wikispeech as a whole.

Lokal_Profil moved this task from Backlog to Blocked on the Wikispeech-Jobrunner (Sprint) board.Oct 29 2020, 8:55 AM

Lokal_Profil moved this task from ⏳ Waiting to 📆 This week on the User-LokalProfil board.Nov 2 2020, 9:32 AM

Lokal_Profil mentioned this in T264842: Deploy Wikispeech in production.Nov 9 2020, 12:46 PM

Lokal_Profil mentioned this in T267918: Performance review of Wikispeech.Nov 17 2020, 9:55 AM

Sorry for the delay, but here are some responses on this:

In T180021#6526856, @Lokal_Profil wrote:

Re: Support plan. WMSE has an active grant to develop this extension lasting until 2021-03-31, after this we have a slightly smaller grant allowing us to go on supporting it until 2022-09-30. If it gets deployed and is well received we also have a vested interest in including support of Wikispeech in our APG budget and look for other grants to support it.

I would like to avoid being particularly curt here, but the above does not sound very reassuring for long-term support. We would likely have to incorporate this fact into our security review and rate this issue with at least a medium risk. The likely mitigation here would be having a Wikimedia Foundation development team co-sponsor this project and agree to some form of indefinite backup support/maintenance. Given the extensive issues of technical debt and code stewardship for many Wikimedia code bases, a complex new service and extension is a lot to support in production without a more well-defined, long-term support model.

In T180021#6532147, @Addshore wrote:

The "path to production" for this extension and all involved parts is covered fairly comprehensively at T264842: Deploy Wikispeech in production.

I'm not certain I'd agree that it is comprehensively covered here. All I am seeing within that task is a basic description of the project and a brief list of the technical steps necessary to put the service into production. By "path to production", we are also referring to such processes as a TechCom critique or architectural review, as noted within T180015#3835943, but seemingly unconfirmed as to it ever happening. I can't seem to find any recent #RFC related to wikispeech/speechoid. If there is one, please let us know where it is. Otherwise I'd state that it's fairly common for new services like this to go through that process, for example the recent effort for the new push notifications service: T249065: RFC: Wikimedia Push Notification Service. Some other recent-ish, similar examples: T206010: RfC: Session storage service interface, T213318: RFC: Wikibase Front-End Architecture, T201963: RFC: Modern Event Platform: Stream Intake Service. I do not believe that the confirmation of a WMF grant is an adequate substitute for this process.

In T180021#6532147, @Addshore wrote:

If there is some set of tasks within this tree that further block being able to get a prioritized security review it would be good to highlight which ones.
And if the above comment doesn't cover the required "support plan" then having spec of what is expected there would also be great!

Given the significant time commitment this security review will require, we likely won't be able to prioritize this work until next quarter (Q3, January 2021 - March 2021), just given current resources and other commitments of the Security-Team. And that would be if the concerns above were adequately addressed regarding a more long-term project roadmap and support plan (we do not have any standard templates for these, but something beyond "we have a grant through date x" would be expected) and some kind of larger technical architecture review, be that an official #rfc or similar.

We are untagging as there is currently no path to production that we are aware of. Should this change, please feel free to tag us back in and we will triage.

• kalle moved this task from 🤠 This week to 🤕 Watching on the User-kalle board.Feb 16 2021, 10:00 AM

Lokal_Profil moved this task from 📆 This week to 👁 Watching on the User-LokalProfil board.Mar 16 2021, 10:14 AM

Lokal_Profil updated the task description. (Show Details)Apr 1 2021, 12:45 PM

Reedy added a project: secscrum.Apr 6 2021, 3:42 PM

Reedy moved this task from Back Orders to Q1:2021 Review Queue on the secscrum board.Apr 6 2021, 3:47 PM

• Jcross assigned this task to Reedy.Apr 13 2021, 5:23 PM

Can I ask about WikispeechSpeechDataCollector?

Is that to be included for "day 1", or for something down the line?

WikispeechSpeechDataCollector is a separate extension that will be used to collect and annotate speech recordings. It's not a requirement for the Wikispeech extension (or vice versa), though the intention is that the speech data will be used to create new voices for it, among other things.

Fair enough, just wanted to double check. It wouldn't be the first time a request has come in for one thing, and then there's been surprise that other related (and sometimes very much needed) things aren't being reviewed... because they've not been asked for.

• kalle moved this task from Blocked to In progress on the Wikispeech-Jobrunner (Sprint) board.Apr 22 2021, 8:11 AM

sbassett moved this task from Q1:2021 Review Queue to In Progress on the secscrum board.May 4 2021, 3:56 PM

Just checking in. Has there been any progress with this task, or if something is planned? Anything we (WMSE) can do to help?

• Jcross unsubscribed.Jul 14 2021, 7:43 PM

@kalle - Due to some scheduling issues, the continued evaluation of this service and the fact that many of the Wikispeech code repositories remain fairly volatile, moving targets (gerrit, particularly patch sets like this), this review will need to extend into the current quarter, which will conclude at the end of September 2021.

sbassett removed a subscriber: • charlotteportero.Aug 12 2021, 4:20 PM

@kalle @Sebastian_Berlin-WMSE et al - having a look at T281662 and T283152, is the plan to deploy the MediaWiki frontend piece as a gadget now, as opposed to the extension? Or is the extension also still a part of the plan? And will wikispeech.wikimedia.se and/or wikispeech.wmflabs.org be more permanent homes for the service pieces?

The gadget solution has been deployed and is currently running on SVWP as a first test. The plan is still to have the extension deployed on WP. The gadget solution was a thing we had to do to actually have something working for WP when a project ended.

wikispeech.wikimedia.se, or a server like it, will need to run the backend, both Speechoid (the service) and MW with the extension, as long as we want to support the gadget. Once we can move over to extension on WP we should be able to retire this server.

wikispeech.wmflabs.org is only for demoing. It's not used in production and could probably be retired once the extension runs on WP too. We may want to keep it for testing and development.

sbassett removed Reedy as the assignee of this task.May 11 2022, 7:52 PM

sbassett moved this task from Unsorted to Needs research on the Technical-Debt board.

sbassett moved this task from In Progress to Back Orders on the secscrum board.

Sebastian_Berlin-WMSE removed a project: User-kalle.Jul 29 2024, 9:08 AM

	Sebastian_Berlin-WMSE
	Nov 8 2017, 11:37 AM

Security review for extension WikispeechOpen, LowPublicActions