Page MenuHomePhabricator

Security Concept Review for the machine vision middleware project
Closed, ResolvedPublic

Description

Project Information

Description of the project and how it will be used

This is a project to support the incorporation of machine vision (MV) generated metadata into Foundation products. Specifically, the project will support:

  • Requesting MV-generated image metadata from machine vision providers (both internal and third-party external)
  • Providing temporary storage for MV results pending human editor verification
  • Serving MV data to Commons users for verification and promotion to Structured Data on Commons
  • Providing the results of human editor verification back to third-party MV providers for model refinement`

For full discussion, see the project page at https://www.mediawiki.org/wiki/Wikimedia_Product/Machine_vision_middleware (including linked Phab tickets).

Description of any sensitive data to be collected or exposed

None

Technologies employed

MediaWiki, MySQL, possibly (<50% likelihood) a Node.js service similar to cxserver

Dependencies and vendor code

Working test environment

This project in an early stage, but a couple of dev/testing APIs are set up in WMCS:

Event Timeline

Mholloway created this task.Jul 9 2019, 3:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 9 2019, 3:11 PM
sbassett triaged this task as Normal priority.Jul 9 2019, 6:32 PM

@Mholloway - thanks for submitting this request. The Security-Team will plan to review this soon and probably post some follow-up questions here. After that, we can provide some guidance on potential risks.

Jcross assigned this task to sbassett.Jul 16 2019, 5:35 PM
Jcross moved this task from Backlog to In Progress on the Security-Team-Reviews board.
sbassett added a project: Restricted Project.Jul 31 2019, 3:49 PM
sbassett moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
Mholloway moved this task from Backlog to Tracking on the Machine vision board.Aug 1 2019, 9:31 PM

@Mholloway - from a high-level perspective, I think the Security-Team is fine with this and would assign a low risk for now, especially given the precedent of things like CX-cxserver. Some considerations:

  1. Has this been reviewed by WMF-Legal yet? I believe cx server went through a similar process; see Acceptance Criteria in T76185. This should probably happen prior to deployment.
  2. Some previous security reviews of the cx service (and related components) might be helpful to peruse at your leisure, as it's a close-ish parallel: T85686, T144467, T143185. Particularly the conversation starting at T143185#2632101, T144467#4795794 and T85686#958661. Mainly just as an idea of what we would look for during a more formal code review. Of course, the attack surface for this service would seem a bit smaller since we're talking about media metadata in a standard format that should not be easily manipulated by potential attackers, unlike the wikitext used by cx server.
  3. Though it wasn't formally noted, I assume every component of this will be using TLS.
  4. Even though some headers can be less critical for these types of services, I would strongly advise setting appropriate security headers, including a robust CSP for each component of this service.
  5. I'd also recommend getting this on the radar of the Performance-Team to see if they have any additional concerns or recommendations for best practices. It may also make sense to reach out to the Language-Team to see if they have run into any performance issues with their usage of the various 3rd party MTs for the cx server.
  6. Finally, once more code has been developed for this service (closer to production-ready), we can definitely perform a more formal security review if you'd like.
sbassett moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Aug 2 2019, 10:17 PM
Jcross added a subscriber: Jcross.Aug 20 2019, 5:27 PM

Hi @Mholloway , Do you have any additional questions or issues related to this ticket that we can help with? Please let us know! Otherwise we'll plan on closing this one out.

Cheers,

Jennifer

Thanks @sbassett for the review and @Jcross for checking in!

  1. Has this been reviewed by WMF-Legal yet? I believe cx server went through a similar process; see Acceptance Criteria in T76185. This should probably happen prior to deployment.

We're talking with them about what we're doing, and will follow up with them when we're code-complete.

  1. Though it wasn't formally noted, I assume every component of this will be using TLS.

Yes.

  1. Even though some headers can be less critical for these types of services, I would strongly advise setting appropriate security headers, including a robust CSP for each component of this service.
  2. I'd also recommend getting this on the radar of the Performance-Team to see if they have any additional concerns or recommendations for best practices.

Filed follow-up tasks for these.

  1. Finally, once more code has been developed for this service (closer to production-ready), we can definitely perform a more formal security review if you'd like.

I will plan on doing that; there's a placeholder task for it as well.

I'm currently in the process of adding support for Google Cloud Vision as a prospective third-party labeling provider. I plan for it to interact with the API through the official PHP client library. Is that something you would want to review specifically? I haven't looked at the implementation details yet, but I would expect that it follows recommended security practices.

That's all I've got, so feel free to close if we're all set for now. Thanks again.

sbassett closed this task as Resolved.Aug 21 2019, 2:11 PM

We're talking with them about what we're doing, and will follow up with them when we're code-complete.

Great.

  1. Even though some headers can be less critical for these types of services, I would strongly advise setting appropriate security headers, including a robust CSP for each component of this service.
  2. I'd also recommend getting this on the radar of the Performance-Team to see if they have any additional concerns or recommendations for best practices.

Filed follow-up tasks for these.

Great, do you have the task numbers? I'd like to reference them here if possible.

I'm currently in the process of adding support for Google Cloud Vision as a prospective third-party labeling provider. I plan for it to interact with the API through the official PHP client library. Is that something you would want to review specifically? I haven't looked at the implementation details yet, but I would expect that it follows recommended security practices.

Ideally, for a security readiness review, we'd like to review any relevant source we have access to, including any and all third-party libraries. While we may not perform an exhaustive review of all third-party libraries (e.g. we aren't going to review 5,000 Node packages for some app) we would at least like to understand what is being used and why.

That's all I've got, so feel free to close if we're all set for now. Thanks again.

Sounds good. I'll resolve for now, but we can always post any relevant follow-up here (legal, related tasks, etc.)

Great, do you have the task numbers? I'd like to reference them here if possible.

Sure. Here are the follow-up tasks:
T230811: Add appropriate security headers to MachineVision
T230813: Performance review for the MachineVision extension
T227346: Security readiness review for the MachineVision extension

I expect to have code ready for the final round of reviews this week or next. Also, I should mention that the NSFW classification stuff is on hold for the time being; as of now the extension will only deal with image labeling.

Thanks again!

Thank you for the quick reply @Mholloway - please just let us know when we can be of further assistance. Cheers!