Security Concept Review for the machine vision middleware project
Closed, ResolvedPublic
Actions

Description

Project Information

Name of project: MachineVision
Project home page: https://www.mediawiki.org/wiki/Wikimedia_Product/Machine_vision_middleware
Name of team which owns the project: Reading Infrastructure
Primary contact for the project: Michael Holloway
Target date for deployment: August 30, 2019
Link to code repository: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/MachineVision
Is this a brand-new project: Yes
Has this project ever been reviewed before: (Phab tasks, etc.): No
Has any risk assessment (STRIDE, etc.) been performed: No
Is there an existing RFC or has this been presented to the community: No
Is this project tied to a team quarterly goal: Yes
Does this project require its own privacy policy: No

Description of the project and how it will be used

This is a project to support the incorporation of machine vision (MV) generated metadata into Foundation products. Specifically, the project will support:

Requesting MV-generated image metadata from machine vision providers (both internal and third-party external)
Providing temporary storage for MV results pending human editor verification
Serving MV data to Commons users for verification and promotion to Structured Data on Commons
Providing the results of human editor verification back to third-party MV providers for model refinement`

For full discussion, see the project page at https://www.mediawiki.org/wiki/Wikimedia_Product/Machine_vision_middleware (including linked Phab tickets).

Description of any sensitive data to be collected or exposed

None

Technologies employed

MediaWiki, MySQL, possibly (<50% likelihood) a Node.js service similar to cxserver

Dependencies and vendor code

https://github.com/rahiel/open_nsfw--
Third-party machine vision providers (e.g., Google Cloud Vision, Clarifai)

Working test environment

This project in an early stage, but a couple of dev/testing APIs are set up in WMCS:

An instance of the proposed NSFW image scoring service is running at https://nsfw.wmflabs.org. (See build and usage instructions at https://github.com/rahiel/open_nsfw--.)
A dev version of an API providing an image labeling response, illustrating the kinds of data we'll be working with, is available at https://visionoid.wmflabs.org.
- Usage: https://visionoid.wmflabs.org/labels?title=File:Foo.jpg

Related Objects
Search...

Status	Assigned	Task
Resolved	• Ramsey-WMF	T225964 [SDC] Build a depicts tag suggestion tool that is powered by machine vision platforms
Resolved	• Mholloway	T226119 Build middleware to utilize machine vision API for structured data on commons depicts tag suggestion tool
Resolved	sbassett	T227591 Security Concept Review for the machine vision middleware project

Event Timeline

• Mholloway created this task.Jul 9 2019, 3:11 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 9 2019, 3:11 PM

• Mholloway mentioned this in T227350: Request a security concept review for the machine vision middleware.Jul 9 2019, 3:12 PM

sbassett triaged this task as Medium priority.Jul 9 2019, 6:32 PM

@Mholloway - thanks for submitting this request. The Security-Team will plan to review this soon and probably post some follow-up questions here. After that, we can provide some guidance on potential risks.

• Jcross assigned this task to sbassett.Jul 16 2019, 5:35 PM

• Jcross moved this task from Incoming to In Progress on the deprecated-security-team-reviews board.

LGoto moved this task from Needs triage to Tracking on the Product-Infrastructure-Team-Backlog-Deprecated board.Jul 17 2019, 3:38 PM

• Mholloway added a parent task: T226119: Build middleware to utilize machine vision API for structured data on commons depicts tag suggestion tool.Jul 17 2019, 3:47 PM

sbassett added a project: user-sbassett.Jul 31 2019, 3:49 PM

sbassett moved this task from Backlog to In Progress on the user-sbassett board.

• Mholloway moved this task from Backlog to Tracking on the MachineVision board.Aug 1 2019, 9:31 PM

@Mholloway - from a high-level perspective, I think the Security-Team is fine with this and would assign a low risk for now, especially given the precedent of things like CX-cxserver. Some considerations:

Has this been reviewed by WMF-Legal yet? I believe cx server went through a similar process; see Acceptance Criteria in T76185. This should probably happen prior to deployment.
Some previous security reviews of the cx service (and related components) might be helpful to peruse at your leisure, as it's a close-ish parallel: T85686, T144467, T143185. Particularly the conversation starting at T143185#2632101, T144467#4795794 and T85686#958661. Mainly just as an idea of what we would look for during a more formal code review. Of course, the attack surface for this service would seem a bit smaller since we're talking about media metadata in a standard format that should not be easily manipulated by potential attackers, unlike the wikitext used by cx server.
Though it wasn't formally noted, I assume every component of this will be using TLS.
Even though some headers can be less critical for these types of services, I would strongly advise setting appropriate security headers, including a robust CSP for each component of this service.
I'd also recommend getting this on the radar of the Performance-Team to see if they have any additional concerns or recommendations for best practices. It may also make sense to reach out to the Language-Team to see if they have run into any performance issues with their usage of the various 3rd party MTs for the cx server.
Finally, once more code has been developed for this service (closer to production-ready), we can definitely perform a more formal security review if you'd like.

sbassett moved this task from In Progress to Waiting on the deprecated-security-team-reviews board.Aug 2 2019, 5:06 PM

sbassett moved this task from In Progress to Waiting on the user-sbassett board.Aug 2 2019, 10:17 PM

• Mholloway mentioned this in T230811: Add appropriate security headers to MachineVision.Aug 20 2019, 3:06 PM

Hi @Mholloway , Do you have any additional questions or issues related to this ticket that we can help with? Please let us know! Otherwise we'll plan on closing this one out.

Cheers,

Jennifer

Thanks @sbassett for the review and @Jcross for checking in!

In T227591#5388300, @sbassett wrote:

Has this been reviewed by WMF-Legal yet? I believe cx server went through a similar process; see Acceptance Criteria in T76185. This should probably happen prior to deployment.

We're talking with them about what we're doing, and will follow up with them when we're code-complete.

Though it wasn't formally noted, I assume every component of this will be using TLS.

Yes.

Even though some headers can be less critical for these types of services, I would strongly advise setting appropriate security headers, including a robust CSP for each component of this service.

I'd also recommend getting this on the radar of the Performance-Team to see if they have any additional concerns or recommendations for best practices.

Filed follow-up tasks for these.

Finally, once more code has been developed for this service (closer to production-ready), we can definitely perform a more formal security review if you'd like.

I will plan on doing that; there's a placeholder task for it as well.

I'm currently in the process of adding support for Google Cloud Vision as a prospective third-party labeling provider. I plan for it to interact with the API through the official PHP client library. Is that something you would want to review specifically? I haven't looked at the implementation details yet, but I would expect that it follows recommended security practices.

That's all I've got, so feel free to close if we're all set for now. Thanks again.

In T227591#5425595, @Mholloway wrote:

We're talking with them about what we're doing, and will follow up with them when we're code-complete.

Great.

Even though some headers can be less critical for these types of services, I would strongly advise setting appropriate security headers, including a robust CSP for each component of this service.

I'd also recommend getting this on the radar of the Performance-Team to see if they have any additional concerns or recommendations for best practices.

Filed follow-up tasks for these.

Great, do you have the task numbers? I'd like to reference them here if possible.

I'm currently in the process of adding support for Google Cloud Vision as a prospective third-party labeling provider. I plan for it to interact with the API through the official PHP client library. Is that something you would want to review specifically? I haven't looked at the implementation details yet, but I would expect that it follows recommended security practices.

Ideally, for a security readiness review, we'd like to review any relevant source we have access to, including any and all third-party libraries. While we may not perform an exhaustive review of all third-party libraries (e.g. we aren't going to review 5,000 Node packages for some app) we would at least like to understand what is being used and why.

That's all I've got, so feel free to close if we're all set for now. Thanks again.

Sounds good. I'll resolve for now, but we can always post any relevant follow-up here (legal, related tasks, etc.)

sbassett moved this task from Waiting to Done on the deprecated-security-team-reviews board.Aug 21 2019, 2:12 PM

In T227591#5428645, @sbassett wrote:

Great, do you have the task numbers? I'd like to reference them here if possible.

Sure. Here are the follow-up tasks:
T230811: Add appropriate security headers to MachineVision
T230813: Performance review for the MachineVision extension
T227346: Security readiness review for the MachineVision extension

I expect to have code ready for the final round of reviews this week or next. Also, I should mention that the NSFW classification stuff is on hold for the time being; as of now the extension will only deal with image labeling.

Thanks again!

Thank you for the quick reply @Mholloway - please just let us know when we can be of further assistance. Cheers!

• Mholloway mentioned this in T227346: Security readiness review for the MachineVision extension.Sep 10 2019, 1:00 AM

sbassett moved this task from Waiting to Done on the user-sbassett board.Oct 29 2019, 3:53 PM

Per the MachineVision Concept Review (T227591), can we confirm that WMF-Legal's review of the extension/service was completed? Is there any corroborating documentation or a new privacy policy/ToU we could reference and review?

Hello! We have continued to keep Legal in the loop after T227591 was originally filed. Since then, the targeted Machine Vision API provider changed (from Clarifai to Google) and with that change came a slight change in requirements that meant we no longer needed to send usage data directly back to the provider (and data dumps contain no user information). As such, the only actual action the user is taking is making a structured data edit, just via a different means than usual. So there's no new policy/ToU other than the prominent notifications about the CC0 license of the structured data contribution. We still intend to review with legal once the tool is up for testing, before release.

I defer to @Slaporte for any additional input he may have

• chasemp removed a project: deprecated-security-team-reviews.Jan 8 2020, 4:15 PM

• chasemp added a project: Application Security Reviews.Jan 8 2020, 4:39 PM

• chasemp moved this task from Incoming to Our Part Is Done on the Application Security Reviews board.Jan 8 2020, 4:43 PM

• chasemp edited projects, added Security Preview; removed Application Security Reviews.Jan 8 2020, 5:11 PM

• chasemp added a project: Security-Team.Jan 8 2020, 5:14 PM

• chasemp moved this task from Incoming to Our Part Is Done on the Security-Team board.Jan 8 2020, 5:15 PM

• chasemp added a project: secscrum.Mar 10 2020, 8:16 PM

Restricted Application added a project: Structured-Data-Backlog. · View Herald TranscriptMar 10 2020, 8:16 PM

• chasemp moved this task from Incoming to Our Part Is Done on the secscrum board.Mar 10 2020, 8:18 PM