Please create the views necessary to expose the MachineVision extension tables to the wiki replicas. Thanks!
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Create wiki replica views for MachineVision extension tables | operations/puppet | production | +6 -0 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Cparle | T238574 Create wiki replica views for MachineVision extension tables | |||
Resolved | ArielGlenn | T236431 Data dumps for the MachineVision extension |
Event Timeline
@Mholloway is all the data in these tables public on the wikis? If not, we will need a sign off by the Security team on exposing them.
None of the data in them is private. There is some data (who voted on the label suggestions, and when and what the votes were) that isn't actually exposed yet, but should be in principle, and is stored in anticipation of being exposed somehow. We're also storing confidence scores for the labels, and SafeSearch scores which we use to omit some images from the main "popular" tab on Special:SuggestedTags. I don't there's any plan to expose these directly on-wiki, but I don't think they're really private, either. Confidence scores will likely end up getting exposed through the API at some point.
In any event, Ariel has asked Reedy sign off on making the table contents public in T236431: Data dumps for the MachineVision extension so it probably makes sense to wait for that to happen before proceeding here, too.
Waiting on related T236431: Data dumps for the MachineVision extension for security review.
This can go ahead, but there is a question about machine_vision_safe_search, and whether that can go ahead.
It's not being listed/included in T236431: Data dumps for the MachineVision extension to be dumped to disk, but technically, as per this task says
Please create the views necessary to expose the MachineVision extension tables to the wiki replicas. Thanks!
And that page does indeed (also) list machine_vision_safe_search, so there is a discrepency that needs sorting out. Is it fine to be dumped in both? Neither?
None of the data is personal or anything, but could potentially be seen as Google Secrets, as to how they're classifying said images
Clarified in sub task. All tables are indeed fine to be exposed into cloud (and dumped too)
Change 623775 had a related patch set uploaded (by Cparle; owner: Cparle):
[operations/puppet@production] Create wiki replica views for MachineVision extension tables
Change 623775 merged by Bstorm:
[operations/puppet@production] Create wiki replica views for MachineVision extension tables
The views have been created on the labsdb10{09,10,11,12} hosts. Note that these tables are only present on wikis with the MachineVision extension installed (testcommonswiki & commonswiki).
$ sql commons (u3518@commonswiki.analytics.db.svc.eqiad.wmflabs) [commonswiki_p]> show create view machine_vision_freebase_mapping\G *************************** 1. row *************************** View: machine_vision_freebase_mapping Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `machine_vision_freebase_mapping` AS select `commonswiki`.`machine_vision_freebase_mapping`.`mvfm_freebase_id` AS `mvfm_freebase_id`,`commonswiki`.`machine_vision_freebase_mapping`.`mvfm_wikidata_id` AS `mvfm_wikidata_id` from `commonswiki`.`machine_vision_freebase_mapping` character_set_client: utf8 collation_connection: utf8_general_ci 1 row in set (0.00 sec) (u3518@commonswiki.analytics.db.svc.eqiad.wmflabs) [commonswiki_p]> select count(*) from machine_vision_freebase_mapping; +----------+ | count(*) | +----------+ | 2099582 | +----------+ 1 row in set (2.24 sec)