Page MenuHomePhabricator

Security review for extension Wikispeech
Open, LowPublic

Description

Project Information

Description of the tool/project

Allows automatic reading aloud of article content, for the supported languages (ar, en, sv).

Description of how the tool will be used at WMF

Presently the goal is to push it to the beta cluster for testing and exploration, then we'll hopefully push it (as a beta feature) to ar., en., sv.wikipedia.

Dependencies

External TTS service, currently running at http://wikispeech-tts.wmflabs.org/.
Gerrit repos: mediawiki/services/wikispeech/*.

  • Mishkal is a Python 2.7-service. Python 3.8 version is available, but only 3.7 is supported by the WMF base Docker images. Will probably have to wait for the release of Debian Bullseye or build our own Docker image without using Blubber in order to get passed this.

Has this project been reviewed before?

CR internally by @Lokal_Profil, @Sebastian_Berlin-WMSE and @kalle .

Working test environment

Demo wiki: https://wikispeech.wmflabs.org/
Installation instructions: Use vagrant puppet role wikispeech.

Post-deployment

WMSE

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Reedy added a subscriber: Reedy.Dec 13 2017, 6:50 PM

Are the 2 special pages actually used?When you've got 2 different special pages just to display a different string on each

This being in both looks kinda suspiciously wrong..

		$out->addHelpLink( 'How to become a MediaWiki hacker' );

Then https://wikispeech.wmflabs.org/wiki/Special:Wikispeech and https://wikispeech.wmflabs.org/wiki/Special:WikispeechLexiconTool aren't inspiring me with very much confidence...

Reedy added a comment.Dec 13 2017, 6:52 PM

And we need to do something about security reviewing the many and various dependancies...

Reedy changed the task status from Open to Stalled.Dec 13 2017, 9:30 PM

I'm putting a hold on this review... There extension is definitely still work in progress. The two special pages are just place holders. The second one shouldn't have been added, it should've been done completely in https://gerrit.wikimedia.org/r/#/c/381189/

The other one needs to go away still

https://gerrit.wikimedia.org/r/#/c/391006/ seems kinda necessary rather than on for everyone (not everyone will want it)

There's also a lot of work for this to be anywhere near production ready, both the extension, and the "server" side. It's too early for a security review.

I've left a few comments on T180015 with some other stuff I've noticed while looking around at the infrastructure in general...

Are the 2 special pages actually used?When you've got 2 different special pages just to display a different string on each

This being in both looks kinda suspiciously wrong..

		$out->addHelpLink( 'How to become a MediaWiki hacker' );

Then https://wikispeech.wmflabs.org/wiki/Special:Wikispeech and https://wikispeech.wmflabs.org/wiki/Special:WikispeechLexiconTool aren't inspiring me with very much confidence...

In the current state, they aren't used, no. Special:Wikispeech is a remnant from the extension boiler plate, if I'm not mistaken. There hasn't been much need for a special page yet, but one will be added by T181269.

Special:WikispeechLexiconTool is part of a more stand alone functionality that should really have been worked on in a separate branch. I'll break that out at some point soon (just need to figure out how to do that with Gerrit).

[...] the many and various dependancies...

Do you mean the external TTS server or are there dependencies in the extension code too?

https://gerrit.wikimedia.org/r/#/c/391006/ seems kinda necessary rather than on for everyone (not everyone will want it)

I'm not quite following. Do you mean that the preferences shouldn't show up for everyone, even if the wiki has the extension installed?

Reedy added a comment.Dec 14 2017, 5:14 PM

https://gerrit.wikimedia.org/r/#/c/391006/ seems kinda necessary rather than on for everyone (not everyone will want it)

I'm not quite following. Do you mean that the preferences shouldn't show up for everyone, even if the wiki has the extension installed?

The extension shouldn't be default enabled for everyone. The mediawiki preference to enable (or disable it) is very necessary (and should've been done before sending it for review. The extension is clearly not production ready with features like this still being developed

In the current state, they aren't used, no. Special:Wikispeech is a remnant from the extension boiler plate, if I'm not mistaken. There hasn't been much need for a special page yet, but one will be added by T181269.

Special:WikispeechLexiconTool is part of a more stand alone functionality that should really have been worked on in a separate branch. I'll break that out at some point soon (just need to figure out how to do that with Gerrit).

So remove them? If it was a boilerplate one, fine. But if it's not going to be used, remove it.

Similarly, https://github.com/wikimedia/mediawiki-extensions-Wikispeech/commit/01f254d951c77f7dc75129e800b1e4c40551ec31 shouldn't have been merged, especially when developing via Gerrit. It's clearly WIP. It should have been introduced, in it's entirety, as a working/complete page in https://gerrit.wikimedia.org/r/#/c/381189/

[...] the many and various dependancies...

Do you mean the external TTS server or are there dependencies in the extension code too?

Yes, the external TTS server. The extension itself has none other than the bundled fontawesome.

Again, the documentation should've been cleaned up first before sending for review.

Even then, in it's current form, that's really not going to work for production. Things can't be just run in a random screen session. They will likely need packaging, and set to run as services

Thank you for your feedback, @Reedy. We'll look at the issues that you mentioned and hope that someone can take a look at the code to find any other things that needs to be fixed before the security review.

@Sebastian_Berlin-WMSE, I was poked by @Theklan at the Wikimedia conference about this.

What's the status of this? Have you looked at the issues?

Thanks :)

Hi @Amire80

We had to put Wikispeech on the slow-burner for a while during the start of the year. The focus during that time was on finishing some partially started features and on re-factoring the javascript directory to make things more legible and manageable. We are now picking this up properly again and focusing on the issues mentioned here.

Out of the issues above:

The fact is that @ElhuyarFundazioa has all ready to upload, but is waiting to this security check to enable it in Basque Wikipedia. Is it possible to upload the extension and then solve other problems for your own project?

The fact is that @ElhuyarFundazioa has all ready to upload, but is waiting to this security check to enable it in Basque Wikipedia. Is it possible to upload the extension and then solve other problems for your own project?

(By "upload" @Theklan probably means "deploy".)

And I join his question: Are the issues from the security review addressed? Is it possible to deploy it and get the other things sorted out later?

(Oh, and thanks a lot to @Lokal_Profil for the very detailed reply. I'm just giving a bit of help to @Theklan to get it deployed.)

Reedy added a comment.Apr 25 2018, 6:34 PM

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

Is it possible to deploy it and get the other things sorted out later?

Specifically, this. No, there are things (like Swift) that basically mean it cannot be deployed until that is implemented (even on beta) :)

Theklan changed the status of subtask T193072: TTS server deployment strategy from Open to Stalled.May 19 2018, 8:24 AM
Vvjjkkii changed the status of subtask T193072: TTS server deployment strategy from Stalled to Open.Jul 1 2018, 1:13 AM
Mainframe98 changed the status of subtask T193072: TTS server deployment strategy from Open to Stalled.Jul 1 2018, 9:56 AM
bd808 added a subscriber: bd808.Aug 30 2018, 5:58 PM

When ready for review, contact Security Team.

Restricted Application added a project: Wikispeech-Jobrunner. · View Herald TranscriptJan 20 2020, 3:52 PM

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

I don't mind installing a package or two, and some basic config. I do mind having to install various packages, configure them, compiling some stuff etc etc etc. Basically, to your review your stuff, we shouldn't have to jump through excessive hoops to get it running

Noting my comment isn't far off being two years old now, when IIRC, vagrant was mostly the goto dev environment for ease of installing complex stuff. Things have obviously changed since then, to some extent

Many people still use vagrant for stuff (I don't primarily, but I do use it for some stuff). I'm happy enough if there's something "standardised", for example with the intention of that being basically how things get deployed to prod in the end, ala a docker based solution or similar

There doesn't seem much point trying to security review anything (really, this wasn't a security review as yet, more a overall look at the ecosystem... Which found documentation to be vastly out of date etc) until the code that is needed is written; things like Swift has to be there for WMF deployment. Plus whatever strategy for deploying the "TTS Server", SRE needs to answer on docker-compose; although we are using it for some stuff, I don't know whether they will allow it for production seervices like this

The lack of a vagrant role (though it seems to be WIP?) makes testing harder, especially setting up of dependant services, which probably need some amount of code review too, rather than just the extension

I guess in the future these services would just use blubber and the deployment pipeline.
Would that be enough? (having built docker images)
Or would they still need to be tied into vagrant in order to make a security review of them easier?

I don't mind installing a package or two, and some basic config. I do mind having to install various packages, configure them, compiling some stuff etc etc etc. Basically, to your review your stuff, we shouldn't have to jump through excessive hoops to get it running

Noting my comment isn't far off being two years old now, when IIRC, vagrant was mostly the goto dev environment for ease of installing complex stuff. Things have obviously changed since then, to some extent

Many people still use vagrant for stuff (I don't primarily, but I do use it for some stuff). I'm happy enough if there's something "standardised", for example with the intention of that being basically how things get deployed to prod in the end, ala a docker based solution or similar

Thanks for the feedback (just realised I never replied here). We have switched over to creating the images using Blubber (largely done but a some things remaining). I'm interpreting the above as the TTS-server (now Speechoid) not needing to be implemented in Vagrant as long as it is fairly easy to set up a service using standard methods.

Lokal_Profil updated the task description. (Show Details)Sep 14 2020, 9:46 PM
Lokal_Profil added a subscriber: kalle.

When ready for review, contact Security Team.

Ready for review again after having implemented valuable feedback from @Addshore

Restricted Application added a project: secscrum. · View Herald TranscriptSep 17 2020, 6:57 AM
kalle updated the task description. (Show Details)Sep 22 2020, 10:07 PM
kalle updated the task description. (Show Details)
Lokal_Profil updated the task description. (Show Details)Sep 23 2020, 9:58 AM
Jcross added a subscriber: Jcross.EditedSep 23 2020, 4:08 PM

@Lokal_Profil - This still isn't ready for review because there are still too many unanswered questions in the task. Is there any maintenance plan with a group at WMF or a long term roadmap or support plan?

Jcross triaged this task as Low priority.Sep 23 2020, 4:16 PM
Jcross moved this task from Incoming to Back Orders on the secscrum board.
Jcross moved this task from Back Orders to Incoming on the secscrum board.
Jcross moved this task from Incoming to Back Orders on the secscrum board.

@Lokal_Profil - This still isn't ready for review because there are still too many unanswered questions in the task. Is there any maintenance plan with a group at WMF or a long term roadmap or support plan?

@Jcross I'm not exactly sure which questions you are referring to (note that comments in this task from before 2019 don't really apply anymore due to re-architecturing etc.).

The plan is for WMSE to go on developing and supporting the extension (assuming it gets deployed). In the long run we might be looking for WMF to adopt it but that is not a discussion which has been had yet, and it makes little sense to have before beta deployment and community feedback on the feature.

Note that the underlying service will get a separate Security review ticket (not created yet).

Jcross added a comment.EditedOct 2 2020, 4:09 PM

@Lokal_Profil There are a few processes we are unable to find evidence of having been followed, and we'd be happy to provide those if you'd like - but it's largely the issue of there being no path to production that will necessitate us prioritizing this review as "Low". We simply do not have the resources to spend on reviews that do not have support plans already in place. This does not mean that we are declining, but that reviews with a support plan in place will always be worked on first. Please let us know if anything changes and we will reconsider priority.

Lokal_Profil changed the task status from Stalled to Open.Oct 7 2020, 7:28 AM

Resetting old status

Jcross added a comment.Oct 7 2020, 4:02 PM

@Lokal_Profil Thank you, sorry we missed that.

@Lokal_Profil There are a few processes we are unable to find evidence of having been followed, and we'd be happy to provide those if you'd like - but it's largely the issue of there being no path to production that will necessitate us prioritizing this review as "Low". We simply do not have the resources to spend on reviews that do not have support plans already in place. This does not mean that we are declining, but that reviews with a support plan in place will always be worked on first. Please let us know if anything changes and we will reconsider priority.

Please highlight any processes you believe that we have missed, any help on that front is welcome.

Re: Path to production. Do you mean a path to production not existing on the WMF side or on out side? If the latter I believe this is largely due to us not having captured that path here on Phabricator. Should largely be rectified via T180015.

Re: Support plan. WMSE has an active grant to develop this extension lasting until 2021-03-31, after this we have a slightly smaller grant allowing us to go on supporting it until 2022-09-30. If it gets deployed and is well received we also have a vested interest in including support of Wikispeech in our APG budget and look for other grants to support it.

As said above a link to these processes would be great.
The "path to production" for this extension and all involved parts is covered fairly comprehensively at T264842: Deploy Wikispeech in production.
I think the support plan was also covered in the above comment, perhaps that should be added to T264842 to as it covers all services here not just the extension.
If there is some set of tasks within this tree that further block being able to get a prioritized security review it would be good to highlight which ones.
And if the above comment doesn't cover the required "support plan" then having spec of what is expected there would also be great!

I think the support plan was also covered in the above comment, perhaps that should be added to T264842 to as it covers all services here not just the extension.

Good suggestion. Added it there and rephrased it to clarify that the support is for Wikispeech as a whole.

sbassett added a subscriber: sbassett.EditedNov 20 2020, 10:15 PM

Sorry for the delay, but here are some responses on this:

Re: Support plan. WMSE has an active grant to develop this extension lasting until 2021-03-31, after this we have a slightly smaller grant allowing us to go on supporting it until 2022-09-30. If it gets deployed and is well received we also have a vested interest in including support of Wikispeech in our APG budget and look for other grants to support it.

I would like to avoid being particularly curt here, but the above does not sound very reassuring for long-term support. We would likely have to incorporate this fact into our security review and rate this issue with at least a medium risk. The likely mitigation here would be having a Wikimedia Foundation development team co-sponsor this project and agree to some form of indefinite backup support/maintenance. Given the extensive issues of technical debt and code stewardship for many Wikimedia code bases, a complex new service and extension is a lot to support in production without a more well-defined, long-term support model.

The "path to production" for this extension and all involved parts is covered fairly comprehensively at T264842: Deploy Wikispeech in production.

I'm not certain I'd agree that it is comprehensively covered here. All I am seeing within that task is a basic description of the project and a brief list of the technical steps necessary to put the service into production. By "path to production", we are also referring to such processes as a TechCom critique or architectural review, as noted within T180015#3835943, but seemingly unconfirmed as to it ever happening. I can't seem to find any recent TechCom-RFC related to wikispeech/speechoid. If there is one, please let us know where it is. Otherwise I'd state that it's fairly common for new services like this to go through that process, for example the recent effort for the new push notifications service: T249065: RFC: Wikimedia Push Notification Service. Some other recent-ish, similar examples: T206010: RfC: Session storage service interface, T213318: RFC: Wikibase Front-End Architecture, T201963: RFC: Modern Event Platform: Stream Intake Service. I do not believe that the confirmation of a WMF grant is an adequate substitute for this process.

If there is some set of tasks within this tree that further block being able to get a prioritized security review it would be good to highlight which ones.
And if the above comment doesn't cover the required "support plan" then having spec of what is expected there would also be great!

Given the significant time commitment this security review will require, we likely won't be able to prioritize this work until next quarter (Q3, January 2021 - March 2021), just given current resources and other commitments of the Security-Team. And that would be if the concerns above were adequately addressed regarding a more long-term project roadmap and support plan (we do not have any standard templates for these, but something beyond "we have a grant through date x" would be expected) and some kind of larger technical architecture review, be that an official TechCom-RFC or similar.

We are untagging as there is currently no path to production that we are aware of. Should this change, please feel free to tag us back in and we will triage.