Page MenuHomePhabricator

Security Readiness Review For the MediaSearch extension
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project:

MediaSearch is a new interface for searching media on Wikimedia Commons with a Vue JS UI.

Description of how the tool will be used at WMF:

This code has been reviewed and merged into the WikibaseMediaInfo (WBMI) extension (see the commit message for the patch linked above for a list of the changes, e.g. swapping out CSS class name prefixes). The Structured Data team has chosen to pull this code into a separate extension for various reasons (increased maintainability and extensibility, lack of dependence on WBMI, clean slate so we can add a modern JS testing framework).

Dependencies

Wikibase, Elastica, CirrusSearch, WikibaseCirrusSearch

Has this project been reviewed before?

The WBMI extension was reviewed years ago, but this specific code has not been reviewed by the security team.

Working test environment

This code as it stands in WBMI already exists on production and beta Commons, but we do not have a working test environment for the new extension.

Post-deployment

Structured Data team: manager Mark Holmquist, tech lead Cormac Parle

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
sbassett moved this task from Incoming to Back Orders on the secscrum board.

Hi @sbassett, we're hoping to get some information on how thorough a review this new extension will need given that it's comprised of existing production code from WBMI. This will help us set a timeline for deployment. Could you let me know your thoughts on this?

Hey @AnneT -

If the code comprising the new extension is verbatim what existed within the WikibaseMediaInfo extension then I don't believe we'd need to complete much of a security review and would likely rate this as a low risk. WikibaseMediaInfo appears to have been around for about 5 years now, seems stable in production and has even been security-reviewed a couple of times (T159709, T200279). So again, if the code is mostly identical to how it existed in late 2018, then we don't need to perform a security review. If significant new code has been added or there have been major refactors of existing code since then, we should perform a security review. If this is indeed the case, it'd also be helpful to call out specific gerrit change sets/git commits where these additions or changes have been made.

Thanks @sbassett. All of the code in this new extension was added to WBMI post-2018, so it seems a security review is in order and it would make sense for that to cover all of the code in the MediaSearch extension.

Follow-up question: Since this code is already in production as part of WBMI, does the security review for MediaSearch need to block deployment of MediaSearch since there will be no additional risk?

Thanks @sbassett. All of the code in this new extension was added to WBMI post-2018, so it seems a security review is in order and it would make sense for that to cover all of the code in the MediaSearch extension.

Ok, we can get this scheduled during our weekly appsec triage meeting. Is there a desired deployment date or time period at the moment? We can't always guarantee that we an accommodate such dates, but it does help us with scheduling.

Follow-up question: Since this code is already in production as part of WBMI, does the security review for MediaSearch need to block deployment of MediaSearch since there will be no additional risk?

Well, the risk would be indeterminate for now, since we haven't performed a recent security review. Typically reviews with critical or high risks would need to be owned by a c-level+ with a mitigation plan and an entry within our risk registry. Please see T249039#6309061 for more information and let me know if you have any additional questions.

@sbassett My apologies, I was speaking of risk more generally—that this code already exists on production, so moving it to a new extension doesn't alter the existing level of risk the code presents—not specific to the security review ranking framework.

As far as timeline, I'd like to draw a distinction between our ideal timeline for deploying the new extension and formally releasing the MediaSearch feature. Though the code is already live on Commons as part of WBMI, the MediaSearch feature has not been "released"—it is not yet the default search on Commons, nor is it widely advertised or linked to. We would like to roll it out on a more widespread basis sometime next quarter after adding more features and further refining the search backend.

That said, we would prefer to deploy the new extension as soon as possible because of the benefits it would pose to our workflow (the most pressing of which is the ability to add a modern JavaScript testing framework for the Vue front-end, which is prohibitive in the current WBMI setup), and because we don't believe it adds risk, although we defer to your team on that matter, of course. Could we move forward with deployment, but not release of the feature, in this circumstance? We don't want to presume to be exempt from the usual review process, and if we need to go through the normal process, we are committed to doing so. We just want to point out the distinctions here in case it's possible for us to deploy the new extension before the full review process is completed.

@AnneT -

So the official guidelines for these sorts of things (which, to be fair, were actually just reworked a bit by me) don't list any review as being a hard blocker for deployment to the beta cluster. Production would be a bit different. According to your comment in T266513#6589831 - "All of the code in this new extension was added to WBMI post-2018" - we would consider this to be unreviewed code, even if it has existed in production within a different code base for a while. So the risk in our mind would still be undetermined, and if we were going to assign a default risk without first conducting a review, it would likely be high for now, given all of the potential unknowns.

Hi @AnneT - we're in the process of finalizing our queue for Q3 and we're wondering if you can let us know how close this is to production ready/what your timeline is? Thank you!

Hi @Jcross (and @sbassett—thanks for your helpful comments on this process earlier, and I'm sorry for my lack of response!), we have been deliberating how to move forward on this as a team. I'm working on getting you a more concrete timeline on when we plan to formally release this feature, but I can tell you that the extension itself will be ready for review within a few weeks. I'll get back to you with more info as soon as possible (hopefully tomorrow Monday next week).

Hi @Jcross, here's an outline of our timeline:

  • The new MediaSearch extension will be ready for security review by the end of January, possibly sooner
  • The MediaSearch feature, which is currently live on Commons (with the code currently in the WikibaseMediaInfo extension), will be rolled out more publicly (e.g. announced, publicly linked to) over the course of this quarter
  • The MediaSearch feature will become the default search on Commons around the end of the quarter, possibly later pending community discussions

I've been chatting with @CBogen, our program manager, and we were hoping to have a brief chat with you about this timeline to make sure we're on the same page. If that's alright with you, Carly will get something on the calendar. Thanks!

Hi @AnneT -as we mentioned in our call, we're revamping our scheduling process a bit to provide more clarity and transparency around our workload management and prioritization process. You'll see that we've placed you into queue for this quarter: https://phabricator.wikimedia.org/tag/secscrum/ and @sbassett or @Reedy will be in touch with any questions or concerns. Thanks for your patience, and please feel free to reach out at any time!

Update: I still plan to complete this security readiness review by the end of this quarter, but it might not be completed until quite literally the end of this quarter.

Thanks for the update, @sbassett! If you can ping me here when you start your review, I'll make sure we pause merging work into the extension.

You mentioned when we met a while back that you prefer using mediawiki-docker for testing. I have some notes here on setting up MediaSearch with Docker, although I have been unable to get elasticsearch working properly in the Docker setup for a while now. That said, with this setup, you can use a config var to point API calls to production Commons and Wikidata, so you can at least see some real results locally.

We also have notes on our more robust Vagrant configuration here.

@AnneT - I'm about 75% complete with the review and should have it completed by tomorrow or this Friday. Apologies for the delay - lots of end-of-quarter items needing attention.

@sbassett Thanks for letting me know! Totally understand—we'll look for more later this week.

Security Review Summary - T266513 - 2021-03-30
Last commit reviewed: 8aeaef361d

Summary

Overall, the current code base looks pretty good. I would currently rate this as a medium risk, though once the various medium-risk issues below are confirmed as either false positives or are mitigated, the overall risk rating would drop to low.

Vulnerable Packages - Development

VulnerabilityPackageNotesServiceRemediationRisk
Prototype pollutioneslint-config-wikimedia@0.19.0, grunt@1.3.0, stylelint-config-wikimedia@0.10.3https://snyk.io/vuln/SNYK-JS-LODASH-590103retirejsPossibly bumping the versions of the affected dev packages, though these made need to just be accepted as low risk low

Outdated Packages
As reported via npm outdated:
(no explicit vulnerabilities reported, simply noting for completeness' sake.)

PackageCurrentWantedLatest
wikimedia-ui-base0.15.00.15.00.18.1

As reported via composer outdated:
(no explicit vulnerabilities reported, simply noting for completeness' sake.)

PackageCurrentLatest
netresearch/jsonmapper2.1.04.0.0
phan/phan3.2.64.0.3
php-parallel-lint/php-console-color0.31.0
psr/container1.1.12.0.1

Build/Test steps

  1. This extension appears to rely upon Vector's new WVUI/Vue support, with dependent libraries included client-side via ResourceLoader, hence no implicit build steps or dependencies like webpack, rollup, etc. No risk here, I just wanted to confirm this assessment is accurate and that Vector is at least an implicit dependency of the MediaSearch extension.

General Security Issues

FileLine(s)IssueRisk
src/Special/SpecialMediaSearch.php631Please confirm that $filters, as it is assembled via getSearchKeywords(), can never be constructed via arbitrary user-input. If it can, it may need to be escaped with preg_quote() or similar. medium
resources/components/QuickView.vue95, 100, 110, 112, 138Please confirm that these v-html sinks either output static content or trusted api/parser content. medium
resources/components/DidYouMean.vue5Please confirm that these v-html sinks either output static content or trusted api/parser content. medium
resources/components/results/OtherResult.vue32Please confirm that these v-html sinks either output static content or trusted api/parser content. medium
resources/components/results/PageResult.vue17, 21, 34Please confirm that these v-html sinks either output static content or trusted api/parser content. medium
resources/components/EmptyState.vue6Please confirm that these v-html sinks either output static content or trusted api/parser content. medium
src/SearchOptions.php144, 148, 153, 157, 187, 223, 251, 271, 310, 314, 359, 367, 374, 400, 405, 410, 415These MediaWiki messages are being generated via the text() output method. While none of these messages contains html within the current i18n files, they may eventually be translated into other languages where potentially dangerous html might be included, or they could be changed via an interface-admin on-wiki. So if these are being sent to the browser, as they appear to be, it is likely preferred to use parse() or escaped(). medium
src/Special/SpecialMediaSearch.php155, 156, 190, 196, 202, 208, 214, 231, 234, 235, 236, 237, 238, 240, 241, 242, 245, 251, 705, 713These MediaWiki messages are being generated via the text() output method. While none of these messages contains html within the current i18n files (except "mediasearch-did-you-mean") they may eventually be translated into other languages where potentially dangerous html might be included, or they could be changed via an interface-admin on-wiki. So if these are being sent to the browser, as they appear to be, it is likely preferred to use parse() or escaped(). medium
src/Special/SpecialMediaSearch.php245Please confirm that $didYouMean and $didYouMeanLink are created from static or similarly-trusted data (e.g. not user-input). I would guess that the mediasearch-did-you-mean message should also be listed as a RawHtmlMessage, e.g. like this? medium

Miscellanous Issues/Points of Discussion/Nits

  1. It might be nice to abstract some of the hard-coded localdev api values (commons, wikidata) into config variables.
  2. Some not-really-security finds from deepscan.io:
FileLine(s)IssueRisk
resources/store/action.js133Condition 'filters' is always true because it evaluates to a non-empty string. The string value is originated from the return value of 'getMediaFilters()' defined at line 15. low
resources/store/action.js191Expression 'response.query' is null checked here, but its property is accessed without null check afterwards at line 217. low
components/SearchFilters.vue234Variable 'value' is null checked here, but its property is accessed without null check prior at line 232. low
components/OtherResult.vue89Expression 'this.imageinfo' is null checked here, but its property is accessed without null check prior at line 86. low
components/QuickView.vue489Expression 'this.imageinfo' is null checked here, but its property is accessed without null check prior at line 86. low
components/App.vue559Condition 'this.term !== ''' is always true at this point because the true branch of the condition 'this.term' at line 559 has been taken. low
sbassett moved this task from In Progress to Waiting on the user-sbassett board.

Change 676638 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[mediawiki/extensions/MediaSearch@master] Update wikimedia-ui-base to 0.18.0

https://gerrit.wikimedia.org/r/676638

Change 677030 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[mediawiki/extensions/MediaSearch@master] Use msg->parse() instead of msg->text()

https://gerrit.wikimedia.org/r/677030

Change 677037 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[mediawiki/extensions/MediaSearch@master] Abstract out API urls for local testing

https://gerrit.wikimedia.org/r/677037

Change 677039 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[mediawiki/extensions/MediaSearch@master] Add did you mean message to RawHtmlMessages list

https://gerrit.wikimedia.org/r/677039

Change 677289 had a related patch set uploaded (by Anne Tomasevich; author: Anne Tomasevich):

[mediawiki/extensions/MediaSearch@master] Miscellaneous JavaScript improvements

https://gerrit.wikimedia.org/r/677289

@sbassett Thanks very much for fitting this in, and for the helpful feedback! I've added responses below more for my own organization and ability to communicate with the team, but I'll update with a separate comment once all of the medium risk items are mitigated.


Vulnerable Packages - Development

I haven't made any updates here since the versions identified are the latest versions of each package.

Outdated Packages

I updated wikimedia-ui-base.

Build/Test steps

This extension appears to rely upon Vector's new WVUI/Vue support, with dependent libraries included client-side via ResourceLoader, hence no implicit build steps or dependencies like webpack, rollup, etc. No risk here, I just wanted to confirm this assessment is accurate and that Vector is at least an implicit dependency of the MediaSearch extension.

This extension doesn't actually use WVUI; we're using our own component library (in resources/components/base) and relying on Vue support in MediaWiki core/Resource Loader (so no build step).

General Security Issues

FileLine(s)IssueComments
src/Special/SpecialMediaSearch.php631Please confirm that $filters, as it is assembled via getSearchKeywords(), can never be constructed via arbitrary user-input. If it can, it may need to be escaped with preg_quote() or similar.$filters consists of static content
resources/components/QuickView.vue95, 100, 110, 112, 138Please confirm that these v-html sinks either output static content or trusted api/parser content.All consist of trusted API/parser content
resources/components/DidYouMean.vue5Please confirm that these v-html sinks either output static content or trusted api/parser content.This is a message with an arg that comes from trusted API content
resources/components/results/OtherResult.vue32Please confirm that these v-html sinks either output static content or trusted api/parser content.This is a message with an arg that comes from trusted API content
resources/components/results/PageResult.vue17, 21, 34Please confirm that these v-html sinks either output static content or trusted api/parser content.All content and message args come from trusted API/parser content
resources/components/EmptyState.vue6Please confirm that these v-html sinks either output static content or trusted api/parser content.This is a message with an arg that comes from trusted API content
src/SearchOptions.php144, 148, 153, 157, 187, 223, 251, 271, 310, 314, 359, 367, 374, 400, 405, 410, 415These MediaWiki messages are being generated via the text() output method. While none of these messages contains html within the current i18n files, they may eventually be translated into other languages where potentially dangerous html might be included, or they could be changed via an interface-admin on-wiki. So if these are being sent to the browser, as they appear to be, it is likely preferred to use parse() or escaped().Fixed
src/Special/SpecialMediaSearch.php155, 156, 190, 196, 202, 208, 214, 231, 234, 235, 236, 237, 238, 240, 241, 242, 245, 251, 705, 713These MediaWiki messages are being generated via the text() output method. While none of these messages contains html within the current i18n files (except "mediasearch-did-you-mean") they may eventually be translated into other languages where potentially dangerous html might be included, or they could be changed via an interface-admin on-wiki. So if these are being sent to the browser, as they appear to be, it is likely preferred to use parse() or escaped().Fixed
src/Special/SpecialMediaSearch.php245Please confirm that $didYouMean and $didYouMeanLink are created from static or similarly-trusted data (e.g. not user-input). I would guess that the mediasearch-did-you-mean message should also be listed as a RawHtmlMessage, e.g. like this?The params for this message consist of a search suggestion that comes from the API response, plus a link constructed internally based on that suggestion. I've added it to the list of RawHtmlMessages.

Miscellanous Issues/Points of Discussion/Nits

  1. It might be nice to abstract some of the hard-coded localdev api values (commons, wikidata) into config variables.
    1. Done
  2. Some not-really-security finds from deepscan.io:
FileLine(s)IssueComments
resources/store/action.js133Condition 'filters' is always true because it evaluates to a non-empty string. The string value is originated from the return value of 'getMediaFilters()' defined at line 15.Fixed
resources/store/action.js191Expression 'response.query' is null checked here, but its property is accessed without null check afterwards at line 217.Fixed
components/SearchFilters.vue234Variable 'value' is null checked here, but its property is accessed without null check prior at line 232.False positive
components/OtherResult.vue89Expression 'this.imageinfo' is null checked here, but its property is accessed without null check prior at line 86.Fixed
components/QuickView.vue489Expression 'this.imageinfo' is null checked here, but its property is accessed without null check prior at line 86.Fixed
components/App.vue559Condition 'this.term !== ''' is always true at this point because the true branch of the condition 'this.term' at line 559 has been taken.Fixed

Change 676638 merged by jenkins-bot:

[mediawiki/extensions/MediaSearch@master] Update wikimedia-ui-base to 0.18.0

https://gerrit.wikimedia.org/r/676638

Change 677030 merged by jenkins-bot:

[mediawiki/extensions/MediaSearch@master] Use msg->parse() instead of msg->text()

https://gerrit.wikimedia.org/r/677030

@AnneT -

This all sounds good and thanks for the above summary! Looking at the related change sets, I see:

  1. https://gerrit.wikimedia.org/r/676638 (merged)
  2. https://gerrit.wikimedia.org/r/677030 (merged)
  3. https://gerrit.wikimedia.org/r/677039 (+2'd, should be merged soon)
  4. https://gerrit.wikimedia.org/r/677289 (+2'd, should be merged soon)
  5. https://gerrit.wikimedia.org/r/677037 (active)

Once the remaining active patches land, they should address all remaining issues surfaced within the review and reduce the overall risk rating to low. And I'll be happy to resolve this task at that point.

Change 677037 merged by jenkins-bot:

[mediawiki/extensions/MediaSearch@master] Abstract out API urls for local testing

https://gerrit.wikimedia.org/r/677037

Change 677039 merged by jenkins-bot:

[mediawiki/extensions/MediaSearch@master] Add did you mean message to RawHtmlMessages list

https://gerrit.wikimedia.org/r/677039

Change 677289 merged by jenkins-bot:

[mediawiki/extensions/MediaSearch@master] Miscellaneous JavaScript improvements

https://gerrit.wikimedia.org/r/677289

Hey @sbassett, all patches are now merged and I've updated my comment to reflect this. Thanks again for the useful feedback!

sbassett moved this task from Waiting to Our Part Is Done on the secscrum board.
sbassett moved this task from Waiting to Done on the user-sbassett board.