Page MenuHomePhabricator

Create wiki replica views for MachineVision extension tables
Closed, ResolvedPublic

Description

Please create the views necessary to expose the MachineVision extension tables to the wiki replicas. Thanks!

Event Timeline

bd808 subscribed.

@Mholloway is all the data in these tables public on the wikis? If not, we will need a sign off by the Security team on exposing them.

None of the data in them is private. There is some data (who voted on the label suggestions, and when and what the votes were) that isn't actually exposed yet, but should be in principle, and is stored in anticipation of being exposed somehow. We're also storing confidence scores for the labels, and SafeSearch scores which we use to omit some images from the main "popular" tab on Special:SuggestedTags. I don't there's any plan to expose these directly on-wiki, but I don't think they're really private, either. Confidence scores will likely end up getting exposed through the API at some point.

In any event, Ariel has asked Reedy sign off on making the table contents public in T236431: Data dumps for the MachineVision extension so it probably makes sense to wait for that to happen before proceeding here, too.

bd808 changed the task status from Open to Stalled.Mar 10 2020, 3:03 PM

Waiting on related T236431: Data dumps for the MachineVision extension for security review.

Reedy triaged this task as Medium priority.Mar 18 2020, 4:14 PM
Reedy moved this task from Incoming to Back Orders on the Security-Team board.
Reedy subscribed.

This can go ahead, but there is a question about machine_vision_safe_search, and whether that can go ahead.

It's not being listed/included in T236431: Data dumps for the MachineVision extension to be dumped to disk, but technically, as per this task says

Please create the views necessary to expose the MachineVision extension tables to the wiki replicas. Thanks!

And that page does indeed (also) list machine_vision_safe_search, so there is a discrepency that needs sorting out. Is it fine to be dumped in both? Neither?

None of the data is personal or anything, but could potentially be seen as Google Secrets, as to how they're classifying said images

Reedy changed the task status from Stalled to Open.Apr 6 2020, 1:57 PM

Clarified in sub task. All tables are indeed fine to be exposed into cloud (and dumped too)

I ran into this missing tables. Can the view be added please?

Change 623775 had a related patch set uploaded (by Cparle; owner: Cparle):
[operations/puppet@production] Create wiki replica views for MachineVision extension tables

https://gerrit.wikimedia.org/r/623775

Reminder that there's a patch waiting for review for this ...

Reminder that there's a patch waiting for review on this ...

Change 623775 merged by Bstorm:
[operations/puppet@production] Create wiki replica views for MachineVision extension tables

https://gerrit.wikimedia.org/r/623775

The views have been created on the labsdb10{09,10,11,12} hosts. Note that these tables are only present on wikis with the MachineVision extension installed (testcommonswiki & commonswiki).

$ sql commons
(u3518@commonswiki.analytics.db.svc.eqiad.wmflabs) [commonswiki_p]> show create view machine_vision_freebase_mapping\G
*************************** 1. row ***************************
                View: machine_vision_freebase_mapping
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `machine_vision_freebase_mapping` AS select `commonswiki`.`machine_vision_freebase_mapping`.`mvfm_freebase_id` AS `mvfm_freebase_id`,`commonswiki`.`machine_vision_freebase_mapping`.`mvfm_wikidata_id` AS `mvfm_wikidata_id` from `commonswiki`.`machine_vision_freebase_mapping`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)

(u3518@commonswiki.analytics.db.svc.eqiad.wmflabs) [commonswiki_p]> select count(*) from machine_vision_freebase_mapping;
+----------+
| count(*) |
+----------+
|  2099582 |
+----------+
1 row in set (2.24 sec)