Page MenuHomePhabricator

Security review for extension Wikispeech
Open, LowPublic

Description

Project Information

Description of the tool/project

Allows automatic reading aloud of article content, for the supported languages (ar, en, sv).

Description of how the tool will be used at WMF

Presently the goal is to push it to the beta cluster for testing and exploration, then we'll hopefully push it (as a beta feature) to ar., en., sv.wikipedia.

Dependencies

External TTS service, currently running at http://wikispeech-tts.wmflabs.org/.
Gerrit repos: mediawiki/services/wikispeech/*.

  • Mishkal is a Python 2.7-service. Python 3.8 version is available, but only 3.7 is supported by the WMF base Docker images. Will probably have to wait for the release of Debian Bullseye or build our own Docker image without using Blubber in order to get passed this.

Has this project been reviewed before?

CR internally by @Lokal_Profil, @Sebastian_Berlin-WMSE and @kalle .

Working test environment

Demo wiki: https://wikispeech.wmflabs.org/
Installation instructions: Use vagrant puppet role wikispeech.

Post-deployment

WMSE

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thank you for your feedback, @Reedy. We'll look at the issues that you mentioned and hope that someone can take a look at the code to find any other things that needs to be fixed before the security review.

@Sebastian_Berlin-WMSE, I was poked by @Theklan at the Wikimedia conference about this.

What's the status of this? Have you looked at the issues?

Thanks :)

Hi @Amire80

We had to put Wikispeech on the slow-burner for a while during the start of the year. The focus during that time was on finishing some partially started features and on re-factoring the javascript directory to make things more legible and manageable. We are now picking this up properly again and focusing on the issues mentioned here.

Out of the issues above:

The fact is that @ElhuyarFundazioa has all ready to upload, but is waiting to this security check to enable it in Basque Wikipedia. Is it possible to upload the extension and then solve other problems for your own project?

The fact is that @ElhuyarFundazioa has all ready to upload, but is waiting to this security check to enable it in Basque Wikipedia. Is it possible to upload the extension and then solve other problems for your own project?

(By "upload" @Theklan probably means "deploy".)

And I join his question: Are the issues from the security review addressed? Is it possible to deploy it and get the other things sorted out later?

(Oh, and thanks a lot to @Lokal_Profil for the very detailed reply. I'm just giving a bit of help to @Theklan to get it deployed.)

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

Is it possible to deploy it and get the other things sorted out later?

Specifically, this. No, there are things (like Swift) that basically mean it cannot be deployed until that is implemented (even on beta) :)

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

I don't mind installing a package or two, and some basic config. I do mind having to install various packages, configure them, compiling some stuff etc etc etc. Basically, to your review your stuff, we shouldn't have to jump through excessive hoops to get it running

Noting my comment isn't far off being two years old now, when IIRC, vagrant was mostly the goto dev environment for ease of installing complex stuff. Things have obviously changed since then, to some extent

Many people still use vagrant for stuff (I don't primarily, but I do use it for some stuff). I'm happy enough if there's something "standardised", for example with the intention of that being basically how things get deployed to prod in the end, ala a docker based solution or similar

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

I don't mind installing a package or two, and some basic config. I do mind having to install various packages, configure them, compiling some stuff etc etc etc. Basically, to your review your stuff, we shouldn't have to jump through excessive hoops to get it running

Noting my comment isn't far off being two years old now, when IIRC, vagrant was mostly the goto dev environment for ease of installing complex stuff. Things have obviously changed since then, to some extent

Many people still use vagrant for stuff (I don't primarily, but I do use it for some stuff). I'm happy enough if there's something "standardised", for example with the intention of that being basically how things get deployed to prod in the end, ala a docker based solution or similar

Thanks for the feedback (just realised I never replied here). We have switched over to creating the images using Blubber (largely done but a some things remaining). I'm interpreting the above as the TTS-server (now Speechoid) not needing to be implemented in Vagrant as long as it is fairly easy to set up a service using standard methods.

When ready for review, contact Security Team.

Ready for review again after having implemented valuable feedback from @Addshore

kalle updated the task description. (Show Details)

@Lokal_Profil - This still isn't ready for review because there are still too many unanswered questions in the task. Is there any maintenance plan with a group at WMF or a long term roadmap or support plan?

Jcross moved this task from Incoming to Back Orders on the secscrum board.
Jcross moved this task from Back Orders to Incoming on the secscrum board.
Jcross moved this task from Incoming to Back Orders on the secscrum board.

@Lokal_Profil - This still isn't ready for review because there are still too many unanswered questions in the task. Is there any maintenance plan with a group at WMF or a long term roadmap or support plan?

@Jcross I'm not exactly sure which questions you are referring to (note that comments in this task from before 2019 don't really apply anymore due to re-architecturing etc.).

The plan is for WMSE to go on developing and supporting the extension (assuming it gets deployed). In the long run we might be looking for WMF to adopt it but that is not a discussion which has been had yet, and it makes little sense to have before beta deployment and community feedback on the feature.

Note that the underlying service will get a separate Security review ticket (not created yet).

@Lokal_Profil There are a few processes we are unable to find evidence of having been followed, and we'd be happy to provide those if you'd like - but it's largely the issue of there being no path to production that will necessitate us prioritizing this review as "Low". We simply do not have the resources to spend on reviews that do not have support plans already in place. This does not mean that we are declining, but that reviews with a support plan in place will always be worked on first. Please let us know if anything changes and we will reconsider priority.

Lokal_Profil changed the task status from Stalled to Open.Oct 7 2020, 7:28 AM

Resetting old status

@Lokal_Profil There are a few processes we are unable to find evidence of having been followed, and we'd be happy to provide those if you'd like - but it's largely the issue of there being no path to production that will necessitate us prioritizing this review as "Low". We simply do not have the resources to spend on reviews that do not have support plans already in place. This does not mean that we are declining, but that reviews with a support plan in place will always be worked on first. Please let us know if anything changes and we will reconsider priority.

Please highlight any processes you believe that we have missed, any help on that front is welcome.

Re: Path to production. Do you mean a path to production not existing on the WMF side or on out side? If the latter I believe this is largely due to us not having captured that path here on Phabricator. Should largely be rectified via T180015.

Re: Support plan. WMSE has an active grant to develop this extension lasting until 2021-03-31, after this we have a slightly smaller grant allowing us to go on supporting it until 2022-09-30. If it gets deployed and is well received we also have a vested interest in including support of Wikispeech in our APG budget and look for other grants to support it.

As said above a link to these processes would be great.
The "path to production" for this extension and all involved parts is covered fairly comprehensively at T264842: Deploy Wikispeech in production.
I think the support plan was also covered in the above comment, perhaps that should be added to T264842 to as it covers all services here not just the extension.
If there is some set of tasks within this tree that further block being able to get a prioritized security review it would be good to highlight which ones.
And if the above comment doesn't cover the required "support plan" then having spec of what is expected there would also be great!

I think the support plan was also covered in the above comment, perhaps that should be added to T264842 to as it covers all services here not just the extension.

Good suggestion. Added it there and rephrased it to clarify that the support is for Wikispeech as a whole.

Sorry for the delay, but here are some responses on this:

Re: Support plan. WMSE has an active grant to develop this extension lasting until 2021-03-31, after this we have a slightly smaller grant allowing us to go on supporting it until 2022-09-30. If it gets deployed and is well received we also have a vested interest in including support of Wikispeech in our APG budget and look for other grants to support it.

I would like to avoid being particularly curt here, but the above does not sound very reassuring for long-term support. We would likely have to incorporate this fact into our security review and rate this issue with at least a medium risk. The likely mitigation here would be having a Wikimedia Foundation development team co-sponsor this project and agree to some form of indefinite backup support/maintenance. Given the extensive issues of technical debt and code stewardship for many Wikimedia code bases, a complex new service and extension is a lot to support in production without a more well-defined, long-term support model.

The "path to production" for this extension and all involved parts is covered fairly comprehensively at T264842: Deploy Wikispeech in production.

I'm not certain I'd agree that it is comprehensively covered here. All I am seeing within that task is a basic description of the project and a brief list of the technical steps necessary to put the service into production. By "path to production", we are also referring to such processes as a TechCom critique or architectural review, as noted within T180015#3835943, but seemingly unconfirmed as to it ever happening. I can't seem to find any recent #RFC related to wikispeech/speechoid. If there is one, please let us know where it is. Otherwise I'd state that it's fairly common for new services like this to go through that process, for example the recent effort for the new push notifications service: T249065: RFC: Wikimedia Push Notification Service. Some other recent-ish, similar examples: T206010: RfC: Session storage service interface, T213318: RFC: Wikibase Front-End Architecture, T201963: RFC: Modern Event Platform: Stream Intake Service. I do not believe that the confirmation of a WMF grant is an adequate substitute for this process.

If there is some set of tasks within this tree that further block being able to get a prioritized security review it would be good to highlight which ones.
And if the above comment doesn't cover the required "support plan" then having spec of what is expected there would also be great!

Given the significant time commitment this security review will require, we likely won't be able to prioritize this work until next quarter (Q3, January 2021 - March 2021), just given current resources and other commitments of the Security-Team. And that would be if the concerns above were adequately addressed regarding a more long-term project roadmap and support plan (we do not have any standard templates for these, but something beyond "we have a grant through date x" would be expected) and some kind of larger technical architecture review, be that an official #rfc or similar.

We are untagging as there is currently no path to production that we are aware of. Should this change, please feel free to tag us back in and we will triage.

Can I ask about WikispeechSpeechDataCollector?

Is that to be included for "day 1", or for something down the line?

WikispeechSpeechDataCollector is a separate extension that will be used to collect and annotate speech recordings. It's not a requirement for the Wikispeech extension (or vice versa), though the intention is that the speech data will be used to create new voices for it, among other things.

Fair enough, just wanted to double check. It wouldn't be the first time a request has come in for one thing, and then there's been surprise that other related (and sometimes very much needed) things aren't being reviewed... because they've not been asked for.

Just checking in. Has there been any progress with this task, or if something is planned? Anything we (WMSE) can do to help?

@kalle - Due to some scheduling issues, the continued evaluation of this service and the fact that many of the Wikispeech code repositories remain fairly volatile, moving targets (gerrit, particularly patch sets like this), this review will need to extend into the current quarter, which will conclude at the end of September 2021.

@kalle @Sebastian_Berlin-WMSE et al - having a look at T281662 and T283152, is the plan to deploy the MediaWiki frontend piece as a gadget now, as opposed to the extension? Or is the extension also still a part of the plan? And will wikispeech.wikimedia.se and/or wikispeech.wmflabs.org be more permanent homes for the service pieces?

The gadget solution has been deployed and is currently running on SVWP as a first test. The plan is still to have the extension deployed on WP. The gadget solution was a thing we had to do to actually have something working for WP when a project ended.

wikispeech.wikimedia.se, or a server like it, will need to run the backend, both Speechoid (the service) and MW with the extension, as long as we want to support the gadget. Once we can move over to extension on WP we should be able to retire this server.

wikispeech.wmflabs.org is only for demoing. It's not used in production and could probably be retired once the extension runs on WP too. We may want to keep it for testing and development.

sbassett moved this task from Unsorted to Needs research on the Technical-Debt board.
sbassett moved this task from In Progress to Back Orders on the secscrum board.