Page MenuHomePhabricator

Security readiness review for the MachineVision extension
Closed, ResolvedPublic

Description

Project Information

Description of the project and how it will be used

This is a project to support the incorporation of machine vision (MV) generated metadata into Foundation products. Specifically, the project will support:

  • Requesting MV-generated image metadata from machine vision providers (internal and/or external) and storing results
  • Serving MV data to Commons users for verification and promotion to Structured Data on Commons

For full discussion, see the project page at https://www.mediawiki.org/wiki/Wikimedia_Product/Machine_vision_middleware.

Description of any sensitive data to be collected or exposed

None

Technologies employed

MediaWiki, Wikibase, PHP, MySQL

Dependencies and vendor code

Working test environment

Post-Deployment

The extension will be maintained primarily by Product Infrastructure, with secondary support from the Structured Data team, particularly for Wikibase and frontend JS issues.

Details

Related Gerrit Patches:
mediawiki/extensions/MachineVision : masterStop (double-)escaping Wikidata IDs when constructing Wikidata URLs
mediawiki/extensions/MachineVision : masterHygiene: Remove unnecesarry Phan suppression

Related Objects

StatusSubtypeAssignedTask
ResolvedRamsey-WMF
ResolvedMholloway
ResolvedMholloway
ResolvedMholloway
Resolvedsbassett
ResolvedMholloway
ResolvedMarostegui
ResolvedMholloway
ResolvedKrinkle
ResolvedMholloway
OpenNone
OpenNone
ResolvedMholloway
ResolvedMholloway
ResolvedMholloway
StalledNone
StalledReedy

Event Timeline

Mholloway created this task.Jul 5 2019, 4:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 5 2019, 4:40 PM
Mholloway renamed this task from Request a security review for the machine vision middleware to Request a security readiness review for the machine vision middleware.Jul 5 2019, 5:16 PM
Mholloway renamed this task from Request a security readiness review for the machine vision middleware to Security readiness review for the MachineVision extension.Aug 20 2019, 3:11 PM
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)Sep 10 2019, 1:00 AM
Mholloway updated the task description. (Show Details)
Mholloway updated the task description. (Show Details)Sep 10 2019, 1:05 AM
Mholloway updated the task description. (Show Details)
Mholloway moved this task from Backlog to Tracking on the MachineVision board.
Mholloway updated the task description. (Show Details)Sep 10 2019, 4:58 AM
Jcross added a subscriber: Jcross.Sep 10 2019, 5:17 PM

Hi @Mholloway -

The team has taken a look at this feels we can provide a basic review by your target deployment date, but that a more in-depth look may need to happen post-launch. We hope that this will work for you and your team - please let us know if you have any questions or concerns and we'll be in touch as we move forward.

Cheers! Jennifer

Hi @Jcross, thanks for the heads-up. That sounds OK to me, but I've started a convo on the parent ticket to confirm that'll suffice for production deployment.

sbassett triaged this task as Medium priority.Sep 17 2019, 4:53 PM
sbassett claimed this task.Oct 10 2019, 9:40 PM
sbassett moved this task from Incoming to In Progress on the deprecated-security-team-reviews board.
sbassett added a project: user-sbassett.
sbassett moved this task from Backlog to In Progress on the user-sbassett board.Oct 15 2019, 4:21 PM

Hi @sbassett, for planning purposes, do you have some idea of when this might be complete?

Mholloway updated the task description. (Show Details)Oct 24 2019, 7:25 PM

@Mholloway - Hey, sorry for the delay on this - I just emailed Amanda this morning about it. I think I can have a general review by the end of next week, so 2019-11-01. Let me know if that blocks deploys or quarterly goals or anything.

Thanks for the update, @sbassett. I think that will work, provided we're speedy with any needed remediations (which I plan to be!). Our main concern is to deploy by the week of Nov. 4 at the latest, since the following week many engineers will be out for Wikimedia Technical Conference, and following that various teams have offsites.

Security Review Summary - T227346 - 2019-11-01
Overall, the MachineVision extension looks pretty good. There are a handful of issues below, but nothing I would categorize as above low for right now.

Vulnerable Packages
As reported by npm audit and retirejs:

  1. jquery 3.3.1, prototype pollution
    • dependency of wikimedia/mw-node-qunit
    • https://nodesecurity.io/advisories/796
    • Affected files (post npm install):
      • node_modules/jquery/dist/jquery.js
      • node_modules/jquery/dist/jquery.min.js
      • node_modules/jquery/dist/jquery.slim.js
      • node_modules/jquery/dist/jquery.slim.min.js
      • node_modules/jquery/package.json

Outdated Packages
As reported via npm outdated:

No explicit vulnerabilities reported, simply noting for completeness' sake.

PackageCurrentWantedLatest
@ wikimedia/mw-node-qunit6.0.0gitgit
eslint-config-wikimedia0.12.00.12.00.15.0
grunt-banana-checker0.7.00.7.00.8.1
grunt-eslint21.0.021.0.022.0.0
grunt-stylelint0.11.10.11.10.12.0
oojs2.2.22.2.23.0.0
stylelint9.10.19.10.111.1.1
stylelint-config-wikimedia0.6.00.6.00.7.0

Other General Security Issues

  1. While it's fairly common for MediaWiki maintenance scripts (e.g. populateFreebaseMapping.php, fetchSuggestions.php, createFileListFromCategoriesAndTemplates.php, createFileListFromGlobalImageLinks.php) include/require files with dynamically-created paths via variables like $basePath and RUN_MAINTENANCE_IF_MAIN, these get picked up by various linters/static analysis tools as potential means of remote code injection. While such vulnerabilities should be difficult to exploit in the case of MediaWiki maintenance scripts, fortifying such includes/requires with checks for valid/expected paths is still probably a worthwhile effort in defensive coding. The same also applies for any file names/paths passed to any of PHP's file system functions (e.g. file_put_contents() within createFileListFromCategoriesAndTemplates.php and createFileListFromGlobalImageLinks.php.)
  2. While no i18n messages came back with any potentially dangerous html, it is an attack vector we've seen attempted before (happy to provide more details off-Phab). So while using ->text() in SpecialSuggestedTags.php (line 31) and MachineVisionEntitySaveException.php (line 19) should be fine, I think we'd recommend using ->parse() or ->escaped() instead, if possible. This would also be true for instances of mw.message().text() within various JavaScript widget files.
  3. In src/Handler/WikidataIdHandler.php on line 68 - did the SecCheckPlugin return this as a SecurityCheck-DoubleEscaped when you were testing locally or in CI? This shouldn't be the case since only the $contents argument should be sanitized within Html::element.
  4. Per this gerrit comment from Brian, to leverage MediaWiki's support of CSP, you do not want to explicitly call it on any of your pages. So you'd probably want to revert r542121 for now.

Policy Compliance

  1. Per the MachineVision Concept Review (T227591), can we confirm that WMF-Legal's review of the extension/service was completed? Is there any corroborating documentation or a new privacy policy/ToU we could reference and review?

Note: the above is a fairly basic review. I didn't focus on any performance-related issues as they pertain to security, as I'm hoping that's happening in T230813. I might also play around with the ext from a black box perspective next week with @Mholloway's docker environment. At a cursory review, r547555 and r547688 both look fine to me, but I might also review those a bit more next week. But none of this should block any deployments, etc.

sbassett moved this task from In Progress to Waiting on the user-sbassett board.

Change 548761 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] Hygiene: Remove unnecesarry Phan suppression

https://gerrit.wikimedia.org/r/548761

Mholloway closed this task as Resolved.Nov 5 2019, 3:15 PM
Mholloway added a subscriber: Ramsey-WMF.

Thanks very much for reviewing, @sbassett!

Security Review Summary - T227346 - 2019-11-01
Overall, the MachineVision extension looks pretty good. There are a handful of issues below, but nothing I would categorize as above low for right now.
Vulnerable Packages
[...]
Outdated Packages
[...]

Created follow-up task: T237415: Vulnerable and outdated npm packages used by MachineVision

Other General Security Issues

  1. While it's fairly common for MediaWiki maintenance scripts (e.g. populateFreebaseMapping.php, fetchSuggestions.php, createFileListFromCategoriesAndTemplates.php, createFileListFromGlobalImageLinks.php) include/require files with dynamically-created paths via variables like $basePath and RUN_MAINTENANCE_IF_MAIN, these get picked up by various linters/static analysis tools as potential means of remote code injection. While such vulnerabilities should be difficult to exploit in the case of MediaWiki maintenance scripts, fortifying such includes/requires with checks for valid/expected paths is still probably a worthwhile effort in defensive coding. The same also applies for any file names/paths passed to any of PHP's file system functions (e.g. file_put_contents() within createFileListFromCategoriesAndTemplates.php and createFileListFromGlobalImageLinks.php.)

Created follow-up task: T237419: Treat dynamically-generated paths more defensively in maintenance scripts

  1. While no i18n messages came back with any potentially dangerous html, it is an attack vector we've seen attempted before (happy to provide more details off-Phab). So while using ->text() in SpecialSuggestedTags.php (line 31) and MachineVisionEntitySaveException.php (line 19) should be fine, I think we'd recommend using ->parse() or ->escaped() instead, if possible. This would also be true for instances of mw.message().text() within various JavaScript widget files.

Created follow-up task: T237420: Prefer Message::parse() or ::escape() over Message::text()

  1. In src/Handler/WikidataIdHandler.php on line 68 - did the SecCheckPlugin return this as a SecurityCheck-DoubleEscaped when you were testing locally or in CI? This shouldn't be the case since only the $contents argument should be sanitized within Html::element.

I believe this was flagged locally, but that's no longer the case, so I've submitted a patch to remove the suppression. https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MachineVision/+/548761/

  1. Per this gerrit comment from Brian, to leverage MediaWiki's support of CSP, you do not want to explicitly call it on any of your pages. So you'd probably want to revert r542121 for now.

Already done: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/MachineVision/+/546385/

Policy Compliance

  1. Per the MachineVision Concept Review (T227591), can we confirm that WMF-Legal's review of the extension/service was completed? Is there any corroborating documentation or a new privacy policy/ToU we could reference and review?

See the comment from @Ramsey-WMF at T227591#5628157.

Since I don't believe any blockers to rollout were identified, I'll resolve this task.

About that SecurityCheck-DoubleEscaped suppression, the rule is in fact triggered in CI if the suppression is removed:

10:07:01 <checkstyle version="6.5">
10:07:01   <file name="src/Handler/WikidataIdHandler.php">
10:07:01     <error line="68" severity="warning" message="Calling method \Html::element() in \closure_aa88a63f8555 that outputs using tainted argument $[arg #2]. (Caused by: Builtin-\Html::element) (Caused by: src/Handler/WikidataIdHandler.php +69; src/Handler/WikidataIdHandler.php +68)" source="SecurityCheck-DoubleEscaped"/>
10:07:01   </file>
10:07:01 </checkstyle>

https://integration.wikimedia.org/ci/job/mwext-php72-phan-seccheck-docker/18657/console

Change 548761 abandoned by Mholloway:
Hygiene: Remove unnecesarry Phan suppression

Reason:
Needed for CI

https://gerrit.wikimedia.org/r/548761

sbassett added a comment.EditedNov 5 2019, 3:50 PM

About that SecurityCheck-DoubleEscaped suppression, the rule is in fact triggered in CI if the suppression is removed:

Interesting. I think that may be a bug - I'll file one for the phan-taint-check-plugin project. But yes, it obviously needs to be suppressed for now.

sbassett added a comment.EditedNov 5 2019, 9:13 PM

About that SecurityCheck-DoubleEscaped suppression, the rule is in fact triggered in CI if the suppression is removed:

Interesting. I think that may be a bug - I'll file one for the phan-taint-check-plugin project. But yes, it obviously needs to be suppressed for now.

Wait, it looks like this actually is a legitimate double-escape, given that attribute values within Html::element and Html::rawElement [[ https://gerrit.wikimedia.org/g/mediawiki/core/+/93bda3e94056742fc1f3a84a7ae9a229b073d1ed/includes/Html.php#553 | are sanitized within expandAttributes() ]]. Playing around in maintenance/shell.php:

>>> $id = '<script>alert()</script>';
=> "<script>alert()</script>"
>>> $x = Html::element( 'a', [                                                                                                                            'href' => 'https://www.wikidata.org/wiki/' . htmlentities( $id ),                                                                                 ], 'hello' );
=> "<a href="https://www.wikidata.org/wiki/&amp;lt;script&amp;gt;alert()&amp;lt;/script&amp;gt;">hello</a>"

and without htmlentities():

=> "<script>alert()</script>"
>>> $x = Html::element( 'a', [                                                                                                                            'href' => 'https://www.wikidata.org/wiki/' . $id,                                                                                                 ], 'hello' );
=> "<a href="https://www.wikidata.org/wiki/&lt;script&gt;alert()&lt;/script&gt;">hello</a>"

Change 548920 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/extensions/MachineVision@master] Stop escaping Wikidata IDs when constructing Wikidata URLs

https://gerrit.wikimedia.org/r/548920

Thanks for digging in on that, @sbassett. I imagine the htmlentities call was originally included out of habit when constructing a URL, but in this case the $id values will only be valid Wikidata IDs (i.e., 'Q' followed by a positive integer) from the DB and thus would need no escaping (even if they weren't escaped anyway within Html::element). I've submitted a new patch to remove the htmlentities call and the Phan suppression.

Change 548920 merged by jenkins-bot:
[mediawiki/extensions/MachineVision@master] Stop (double-)escaping Wikidata IDs when constructing Wikidata URLs

https://gerrit.wikimedia.org/r/548920

sbassett added a comment.EditedNov 6 2019, 7:52 PM

Thanks for digging in on that, @sbassett. I imagine the htmlentities call was originally included out of habit when constructing a URL, but in this case the $id values will only be valid Wikidata IDs (i.e., 'Q' followed by a positive integer) from the DB and thus would need no escaping (even if they weren't escaped anyway within Html::element). I've submitted a new patch to remove the htmlentities call and the Phan suppression.

Sure, thanks. Generally speaking, I personally believe it fine to be incredibly distrusting of any dynamic data, even data which we know will always be safe like Wikidata ids. If not to guard against currently unknown or zero-day attack vectors, then to guard against random users (especially privileged users) going rogue or having their accounts compromised.

sbassett reopened this task as Open.Nov 6 2019, 8:06 PM
sbassett closed this task as Resolved.Nov 26 2019, 4:41 PM
sbassett moved this task from Waiting to Done on the deprecated-security-team-reviews board.

I think everything from T227346#5636348 is now resolved and merged, with the possible exception being T237415, which seems mostly resolved, or at least what can be. So, once again, resolving this task.

sbassett moved this task from Waiting to Done on the user-sbassett board.Nov 26 2019, 4:41 PM