Page MenuHomePhabricator

Security review - Wikibase Termbox Front End
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

Disclaimer: Description here focuses on the “termbox” as defined below, we like to stress though that “termbox” is the first part of Wikibase UI where the approach, hence the code, in question is to be applied.
The component we intend to deploy is responsible for the part of Wikibase UI, namely a part of Item and Property pages where their respective labels, descriptions and aliases are displayed, and can be edited.
The component provides both the code to be run on client side (in user’s browser), as well as a service to be run server side, generating parts of the page sent to the client.

Description of how the tool will be used at WMF

Server-side component (node service) is intended to be run on WMF service infrastructure. Service will be called from MediaWiki code. Service does not need to publicly exposed apart from being reachable from MediaWiki process.
Client-side component is going to be sent to client with other parts of Wikibase and MediaWiki UI via Resource Loader.

Dependencies

Has this project been reviewed before?

  • vue js library has gone through security review in late 2017: T168264
    • To be noted here: Review has concerned the version 2.3 of the library, which is also the version currently used in Wikibase. We do intend to update the version used to 2.6 for the Termbox work, which is subject to review here.
  • General Wikibase Front End Architecture review/request for comments pending approval from Wikimedia Technical Committee T213318. Request for comments last call ending 2019-02-20

Working test environment

Post-deployment

Wikimedia Deutschland. Primary contact: @WMDE-leszek

Related Phabricator tickets

T213318: RFC ticket for the general Wikibase Front End Architecture. Tool being subject to the request is the first step towards the presented architecture.
T212189: Request for the service running the server-side component of the discussed code. Request on hold until the RFC above gets accepted.

Event Timeline

Krenair renamed this task from Security review Action Items to Security review Action Items - Wikibase Termbox Front End.Feb 18 2019, 4:58 PM
WMDE-leszek renamed this task from Security review Action Items - Wikibase Termbox Front End to Security review - Wikibase Termbox Front End.Feb 18 2019, 5:04 PM

Update: This review is still in progress. Very hopeful to have some initial reporting by next week, though this is a fairly substantial codebase. At this point in time I don't think it should block any planned deployments.

sbassett triaged this task as Medium priority.Mar 12 2019, 3:28 PM

Note: the deployment for the SSR service (and related components) seems to be delayed until next quarter: T212189#5017076

@WMDE-leszek Sorry for the lack of updates on this. Do we have an updated deployment date for this? I know that date had changed since this task was created. I've had a basic look at the code, but haven't completed a full review yet. If we can assign an updated deployment date for this, the Security-Team can establish a more accurate due date for the security review deliverable. Thanks.

Thanks for the update @sbassett. I do understand that you folks have been busy with more important things recently, so I appreciate reporting back here. Indeed, the deployment has to change, given it is April already. At WMDE we'd be happy to have this extension deployed this month (so the date to consider could be 2019-04-30), although I don't put it in the main ticket description yet, as basically it depends on when the SSR service is ready. I am reaching out to SRE, and will come back here as soon as I have some more details from them, and April deployment date is going to be too early.

Great, thanks for the update @WMDE-leszek. I'll target the completion of this review to happen before the 2019-04-30 date. And yes, any updates here as to when SRE can handle the deploy would be great.

just a heads up @sbassett: with recent information from SRE in T212189#5094393 we would stick to the planned deployment date of 2019-04-30.

@WMDE-leszek - just wanted to check and see if there's any working local development environment for this (Vagrant roles, Docker, wikibase-docker w/ additional config instructions, etc.) It's probably not critical for this review, but it would be more helpful than playing around with the wikidata.beta.wmflabs.org example. Thanks.

@sbassett yes, absolutely. We're locally using https://github.com/addshore/mediawiki-docker-dev as the MediaWiki local environment, and have Wikibase, MobileFrontend, etc installed there. Then the SSR service can be set up with Docker as described in https://github.com/wikimedia/wikibase-termbox/blob/master/README.md, in particular setting the vars described in the "Production level environment variables" section, and paying attention to the Docker network setting (MEDIAWIKI_NETWORK_TO_JOIN), and then pairing up the Wikibase instance ("Configuring Wikibase" section).

Hopefully you'll get this set up using this information. In case of questions or issues with getting the local set up ready, please report here.
We of course any input on how the description of the set up, and the general documentation could be improved. Setting the service in the local environment for the first time is always a good evaluation of the docs.

My developer colleague has just provided a possibly useful advice for the security review/testing of the node service itself: to avoid setting up MediaWiki and Wikibase, one could point the service, using SSR_PORT and WIKIBASE_REPO, to e.g. beta wikidata. The service is not opinionated in this regard that it expects Wikibase instance to be there and await its results, so if the MediaWiki integration etc is not the concern, or not the concern of particular part of the review, this could save quite some set up effort.
In this case my above note about MEDIAWIKI_NETWORK_TO_JOIN is also of little relevance.

Update: I still plan to have a review completed by the end of this week (April 26th). After performing some basic analysis, I'm seeing several issues surrounding dependent packages. Many of these are dev dependencies, but there are some high prod vulnerabilities around older versions of js-yaml and some additional deprecation warnings (via both npm audit and npm outdated). I plan to attach these findings to my final report, but they can be easily enumerated by a developer right now via the relevant npm commands.

Also - and I'm certainly not trying to cast blame here - but I'm seeing 1,202 npm dependencies when I run an npm install. Sure, a big chunk of those are dev, but that's quite a lot especially given npm's known supply chain issues and other means of exploitation. At some point this is probably going to be classified as a high risk with appropriate risk ownership needing to be signed off by a given development team and/or director.

Thanks for the update @sbassett, and for the note on issues in npm dependencies. We're aware of them (in general). there is also a task were we try track these: T218980. Some of high vulnerabilities (e.g. js-yaml) still need fixing (i.e. updating the package version), and we're going to do this.
We would happily hear about ideas how our process of tracking security vulnerabilities could be improved in the long run, and in more systematic way. The current "process" of manual running npm audit and/or npm audit --fix is not sustainable, and it only somewhat works as long as there is active development phase going on.

We have had some unspecific ideas here at WMDE about e.g. having a daily/night job that would run npm audit and/or --fix, and possibly alert about issues discovered, and/or possibly automatically fix those that could be fixed without manual interaction. We'd be curious to hear if there is already infrastructure we could use, or whether you have thoughts on this topic in general. We're certainly not the first node project in WMF realm.

@WMDE-leszek - for automated checks of what I'd call low-hanging fruit (reported CVEs, etc.) I would advise that, at a minimum, we get some CI tooling in place for various WMDE Node repos, if that currently isn't the case (sounds like it isn't.) Within a few different contexts, such tasks have had various starts/stops in the recent past (T179381, T174767, T96078, T104164) but T200717#4474050 leads me to believe that this is ready to go via the npm6 dockers as there are also instructions provided on how to get this set up (guessing it's adding the right jobs here.) This should reduce the manual effort on any developer's part and I'd guess these could be made non-voting (maybe even just the npm audit piece) if that were an issue. AIUI, these can also be set up to run when patches get pushed to gerrit or on a daily schedule (if you don't mind the spam.)

Some additional best practices:

  • Being conscious of how many node packages are used during development. It's easy to get carried away with the usage of hundreds if not thousands of node packages and this can greatly increase the attack surface of an application.
  • Pinning specific, verified packages. I noticed that package.json uses caret notation for most package versions as opposed to a tilde or even just a specific release, which has been audited and verified in some capacity. This is great for pulling in fresh code with new features and maybe even some security patches, but it also opens an application up to various supply-chain attacks which have been a problem for npm.
  • Using a private npm registry of vetted packages. This obviously doesn't exist and requires a heavy maintenance effort, but there's precedent with apt and (kinda) composer, and we probably need to try to get here with npm, or at least wrap some official (enforced?) best practices around its usage.
  • Automate ways of running npm audit / outdated (and other tools) via test or fix scripts, etc. Still places the onus on developers, but it shouldn't be any worse than running unit tests.

Security Review Summary - T216419 - April 2019
Overall, this looks ok, and I'm basing a lot of that opinion on the RFC/new service tasks where this was heavily discussed (T213318, T212189). Given that the SSR isn't a public-facing service (and the client-side fallback is just transpiled JS), this review is a little different in that several standard web and mobile application attack surfaces are non-existent. As mentioned previously, this is using a large amount of external dependencies (412,875 lines!), and various mitigations and best practices should be explored via the recommendations at T216419#5141086 to address this issue. I would also like to spend a little more time examining potential issues around the HTML rendering, which I'm hopeful to get to next week. Though that shouldnt delay this review process any further. The Security-Team would also like to start framing these reviews within the context of risk and risk ownership - see our draft policy here - as opposed to a yes/no response. With the unmitigated issues mentioned on this task and within the report below, I would currently classify the risk of this application to be at least Medium/Moderate.

Vulnerable Packages

  1. npm audit: F28795800
  2. npm outdated: F28795796
  3. retirejs: F28796300
  4. snyk: F28795797

Static Analysis Findings

  1. tslint-security: F28796299

DoS Vectors

  1. ReDoS in src/server/route-handler/termbox/QueryValidator.ts, line 34: pattern: new RegExp( ^${languagePattern}(\\|${languagePattern})*$, 'i' ) - I'm able to get this to take quite a long time by increasing the length of a test subject. For example, try running this code: time node -e '/^[a-z]{2}[a-z-]*(\\|[a-z]{2}[a-z-]*)*$/.test("abcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaz1")'

General Security Issues

  1. The demo at https://wikidata.beta.wmflabs.org/wiki/Q11?useformat=mobile seems clickjackable. A similar production page at https://www.wikidata.org/wiki/Q1111111?useformat=mobile also seems vulnerable, so I'm guessing this is just an issue with wikidata.org (and other wikis - enwikipedia's front page also suffers from this).

@WMDE-leszek - I just wanted to check in and see if you or anyone else had any questions re: my responses above and if we have a better sense around the deployment timeline for this service. Note: I will be on leave from approximately May 13th to June 10th and will only be minimally checking in on work-related items.

Thanks a lot @sbassett, I was in the middle of writing the answer, when your comment popped in :)

Thanks for the review. I find the idea of different framing of reviews, and talking about the risk level and its ownership interesting (i.e. sounds good). For the sake of readers who read this ticket in the context of the service deployment, I will spare any further remarks on this in this very place.

We have not yet made much progress regarding automated ways to reduce the number and impact of vulnerabilities in the dependency libraries we use. We do appreciate the recommendations you've made. So far we are improving the situation with manual changes, and are planning out for more sustainable processes. We see ourselves (our service) as part of wider ecosystem of the JS code maintained and running in WMF environment, so we hope we'll not be solving this riddle on our own.

DoS Vectors

  1. ReDoS in src/server/route-handler/termbox/QueryValidator.ts, line 34: pattern: new RegExp( ^${languagePattern}(\\|${languagePattern})*$, 'i' ) - I'm able to get this to take quite a long time by increasing the length of a test subject. For example, try running this code: time node -e '/^[a-z]{2}[a-z-]*(\\|[a-z]{2}[a-z-]*)*$/.test("abcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaz1")'

The code in question is being removed now, as we're moving the validation to the OpenAPI validator library. We expect the library we'll do a similar regex check, hence it wouldn't be fair to claim the problem goes away by us doing such change. We'll also try to add some extra checks to prevent performing regex operations on the input which is too long to be by any chance a valid input, to avoid the too long running checks, which might potentially become an issue, as you've pointed out.

Static Analysis Findings

  1. tslint-security: F28796299

We've skipped the analysis results on the test code, which left three issues. Two related to the regex will go away given the code in questions is being removed, as I said above.
The "Potential timing attack on the right side of expression" report seem to be a false positive reported by the tool.

General Security Issues

  1. The demo at https://wikidata.beta.wmflabs.org/wiki/Q11?useformat=mobile seems clickjackable. A similar production page at https://www.wikidata.org/wiki/Q1111111?useformat=mobile also seems vulnerable, so I'm guessing this is just an issue with wikidata.org (and other wikis - enwikipedia's front page also suffers from this).

Thanks, this reminded me that clickjacking has been reported as an issue in Wikidata UI in T189491 (reported via T186726). It does not give us much reason to be proud of that the overall Wikidata code has not improved with this regard since then, but it does seem like a broader issue. If there has been improvements in MediaWiki in this department recently that Wikidata has fallen behind, I'd appreciate pointing us to these, so we could adapt Wikidata too. This seems to be a broader topic than this service and the code using it, though.

Thanks for the information about your availability @sbassett. I expect we'll come back here with report on improvement to the dependency vulnerabilities and the DoS-regex code later this week. I hope we'll be able to get the green light for the moving the service to production before you're gone.

I would also like to spend a little more time examining potential issues around the HTML rendering, which I'm hopeful to get to next week. Though that shouldnt delay this review process any further.

Should you had a chance to dig into these, we'd certainly appreciate your thoughts, even if, as you say, it might be going outside of the scope of this very review.

Hi again @sbassett. Hoping this would be useful update for you, I'd like to report on some improvements and changes that have happened in the last couple of days.

Vulnerable Packages

  1. new result of npm audit: F28984665 - if I scan the report right, the only two and low vulnerabilities are lodash, being dependency of service-runner
  2. new result of npm outdated: F28999869 - all dependencies listed are dev dependencies if I read it correctly
  3. new result of retirejs: F29000126
  4. new result of snyk test: F28999766

Result of npm outated hasn't changed significantly , apart from new patch version of vuex released yesterday we haven't picked up yet.

Static Analysis
new result of tslint-config-security: F29000507 - the only non-test issue reported is in my understanding a false positive, caused by accessing .name property of the object. The code has nothing to do with passwords, authentication, so the timing attack risk is not really present here.

I haven't got around to run snyk, retirejs and tslint-security but I'll update this comment with new report files as soon as I have them. As mentioned already before, the issues reported by tslint security should all be taken care of already.

DoS Vector - long running regex check
The code referred to earlier, i.e. src/server/route-handler/termbox/QueryValidator.ts has been removed.
Input validation now is handled by https://www.npmjs.com/package/openapi-request-validator, using the specs defined in: https://github.com/wikimedia/wikibase-termbox/blob/master/openapi.json#L90-L92
We've verified that the length check happens prior the the regex pattern matching, i.e. long input should not result in excessively long checks.

Finally, as you've mentioned the deployment date: SRE are working hard on the Kubernetes front, which as far as we know is the remaining building block: T220402. I couldn't tell for SRE people, but at WMDE we intend to have the service deployed in May 2019.

@WMDE-leszek - sounds good and thanks for the update including the updated deployment timeline! I'll await the updated reports to review. And the openapi-request-validator length check looks good after cursory review.

@sbassett updated the comment above with the full set of new reports.

@WMDE-leszek - Ack, it looks like all of these reports have gotten a bit worse? I see more potential and confirmed vulnerabilities in all of them. It seems some additional packages have been added (to fix the ReDoS and I imagine additional, non-security-related issues) which has increased the attack surface. For now I would still assign a risk of at least Medium/Moderate to this application if it were to be deployed within this state.

@sbassett I'd agree that situation is not perfect even after our recent changes.
I believe the additional dependencies didn't add known vulnerabilities, but they're additional packages indeed. Also, some new issues have been discovered in "old" dependencies since the time you've checked.

We'll continue striving towards 0 vulnerability dependency state, but this will not be an immediate change, given most of vulnerable dependencies are "indirect" ones.

In the context of T212189/T220402 I was wondering how the result of this security inspection, and our software's Medium/Moderate risk puts us in the context of the deployment of the service to WMF production environment. While I understand and share the idea that reasoning in terms of risk makes more sense that binary secure/insecure in case of the software that both changes itself and also does not exist in isolation, I imagine that for the deployment process the involved engineers might be expecting more of a "go"/"no go" answer with regards to the deployment of the code in its current state.

As @sbassett is likely already on leave, I dare to ping @JBennett for advice how does this work with Security Team's new Security Readiness Review process.

Hey @WMDE-leszek - I am indeed on leave until June 10th. I believe the rest of the Security-Team is aware of this review, but if you don't hear any response this week, I might write to security-team@wikimedia.org and request an update on how a deployment might be affected. Our team meetings are typically held Tuesday mornings where items like this are evaluated as a team.

@WMDE-leszek, @RazShuty - update: @JBennett is currently reviewing the risk ownership and risk register entry for this project. We should have an update on how the Security-Team would like to proceed with this shortly. In the meantime, if this is blocking deployment or anything else, please let me know on-task. Thanks for your patience.

Thanks for the update @sbassett. I've also received similar information via email from @Reedy, which is much appreciated. Given this information we've been confident we're on a way, and so we were not bugging you on this review for last few weeks.

This week we've managed to get other deployment blockers solved, and with T220402 ready we're actually pretty close to the deployment. We've been considering rolling the service out for the use on test.wikidata.org next week. What would be your thoughts on this @sbassett, @JBennett ?
I admit this is a bit of short notice, but I dare to ask hoping that maybe you've been able to make some progress with assessing the security risk of our service, and what would the further process. We'll of course not move on with deployment until we've heard from you. Thanks!

@WMDE-leszek limited deployment on test.wikidata.org of the Termbox service should be fine. Just out of curiosity, is there a more longterm deployment strategy and timeline for rolling out the Termbox service on anything outside of test/beta wikis and wikidata.org?

Finally - just to reiterate on the concept of risk ownership - the Security-Team performs security and risk reviews and assessments, determines different issues and levels of risk and then assigns ownership of said risk based upon severity and other factors to individuals and teams across the Foundation and movement. In theory, the results of these reviews and assessments should never block deployments (outside of extreme cases) from the Security-Team's perspective though we do expect engineers, managers and directors to understand the potential risks of their actions and make appropriate decisions based upon such considerations.

sbassett changed the task status from Open to Stalled.Jun 25 2019, 5:29 PM

Update: @JBennett and I worked on a risk assessment and risk registry entry for SSR Termbox this week. We need to review it with some WMF folks (CTO, etc.) and should be able to provide a more official update and resolve this task soon.

@sbassett minor update: we haven't deployed the service to use it on test.wikidata.org. There have been some additional technical issues on the way (with the deployment infrastructure, not with our service), and it is a middle of summer holiday season here at WMDE, so things move slow. We should do the test deployment to test.wikidata.org in a week or two. For the completeness, we'll update on this here when it has happened.

And apologies for answering your question so late: Termbox service is really internal to Wikidata, so we do intend to use with other wiki than wikidata.org (and related test and beta sites).

I will be offline until July, 22nd. My WMDE colleagues follow updates on this ticket, so in case if there are any updates, questions, etc from your side, please just leave a comment here.

Thanks for the update, @WMDE-leszek. This sounds good. @JBennett and I have finished up the risk assessment and are in the process of scheduling some time to review with @RazShuty and WMF's CTO. This will cover 1) a review of the risk assessment 2) establishment of risk ownership 3) setting a timeline for a mitigation plan.

Update: still keeping this task open and stalled pending action from @JBennett's July 19th email to @RazShuty, @EBjune and myself.

During Wikidamania @EBjune and @RazShuty have agreed on accepting the risk as assesed by the Security Team. I take that due to its nature, the communication about the further steps will be handled by email mostly (as has happened so far).
I am letting unstalling and other changes of the status of this ticket on the discretion of WMF Security team.

Based on agreement on the risk we'll be also proceeding with deploying the service in question for use in production Wikidata site.

@WMDE-leszek @RazShuty - Just talked with @JBennett. Looks like everything is official re: risk ownership, so I'm going to resolve this task for now. Thanks everyone for all of the patience on working through this review and risk ownership assessment.