Open view for term_full_entity_id in wb_terms table in labs
Closed, ResolvedPublic

Description

Today, I wasn't able to see the term_full_entity_id column in labs even though it's there. It would be great if you let labs have it as it will replace two other columns. The data is 100% public.

Change 357369 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/software/labsdb-auditor@master] Whitelist term_full_entity_id in wb_terms table

https://gerrit.wikimedia.org/r/357369

Ladsgroup claimed this task.
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJun 6 2017, 11:45 AM
daniel added a comment.EditedJun 6 2017, 3:13 PM

I'm dubious about exposing this column to labs tools before it is fully populated. This at least needs a warning somewhere...

We need to time this with our deployment schedule, running the migration script, and announcing changes.

chasemp changed the task status from Open to Stalled.Jun 8 2017, 2:24 PM
chasemp added a subscriber: chasemp.

Change 357369 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/software/labsdb-auditor@master] Whitelist term_full_entity_id in wb_terms table

https://gerrit.wikimedia.org/r/357369

This file isn't really used atm.

The place to configure the view would be https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/role/templates/labs/db/views/maintain-views.yaml

I would say to expose this we would need:

  1. @daniel to say yeah let's do this
  1. A sign off (just a comment really) from Security saying cool this really doesn't expose anything extra or untoward
  1. A DBA to ensure the data is good to go on the labsdb's
  1. A mod to the above maintain-views.yaml file
  1. An Opsen to run maintain-views update on the labsdb hosts :)

That seems like a lot but is pretty doable I swear! I'm going to stall this atm though until @daniel says it's ready to kick off the process.

chasemp lowered the priority of this task from High to Normal.Jun 8 2017, 2:25 PM
Ladsgroup moved this task from incoming to in progress on the Wikidata board.Jun 9 2017, 9:12 AM
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 24 2017, 11:15 AM

92% of the populating the table has been done now and it will finish by Thursday, I think we should pick this task up.

Ladsgroup added a comment.EditedSep 29 2017, 1:12 PM

Okay, the population is done and we can pick this up now. One thing to note that, in the file you mentioned above wb_terms is already in full_view mode so I guess there is nothing needs to be done except a round of maintain-view on labsdbs. The data is very public and we don't store any private data in it. I think I can ask Security to take a look though.

Pasleim added a subscriber: Pasleim.Oct 4 2017, 6:53 PM
Ladsgroup changed the task status from Stalled to Open.Oct 6 2017, 8:29 AM

It's blocking our work

daniel added a subscriber: bd808.Oct 10 2017, 9:17 AM

@bd808 can you have a quick look at this, please? We need to have wb_terms.term_full_entity_id added to the public view on labs. According to what @Ladsgroup said above, this should be trivial.

bd808 reassigned this task from Ladsgroup to Andrew.
bd808 added a subscriber: Andrew.

wb_terms is listed as a "full view" table in maintain-views.yaml, so I think we just need to run maintain-views on the lbasdb hosts (labsdb10{01,03,09,10,11}) to make sure that the view is up to date with the underlying table structure. @Andrew can do this during his on-call rotation this week.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Oct 10 2017, 3:15 PM

I've run maintain-views, but the wb_terms table isn't getting replicated at all. I don't see any evidence of filtering in the sanitarium files but I may be looking in the wrong place... @Marostegui, any ideas?

Marostegui added a comment.EditedOct 10 2017, 7:23 PM

@Andrew I do see the table and the view for wikidata on labsdb1001 and labsdb1009 for instance on both, the normal db and the _p db:

mysql:root@localhost [wikidatawiki]> show tables like 'wb_terms';
+-----------------------------------+
| Tables_in_wikidatawiki (wb_terms) |
+-----------------------------------+
| wb_terms                          |
+-----------------------------------+
1 row in set (0.00 sec)

mysql:root@localhost [wikidatawiki]> use wikidatawiki_p
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql:root@localhost [wikidatawiki_p]> show tables like 'wb_terms';
+-------------------------------------+
| Tables_in_wikidatawiki_p (wb_terms) |
+-------------------------------------+
| wb_terms                            |
+-------------------------------------+
1 row in set (0.00 sec)

mysql:root@localhost [wikidatawiki_p]> select * from wb_terms limit 1;
+-------------+----------------+---------------------+------------------+---------------+-----------+------------+-----------------+-------------+
| term_row_id | term_entity_id | term_full_entity_id | term_entity_type | term_language | term_type | term_text  | term_search_key | term_weight |
+-------------+----------------+---------------------+------------------+---------------+-----------+------------+-----------------+-------------+
|      363474 |          50445 | Q50445              | item             | en-gb         | label     | Jungwon-gu | jungwon-gu      |           0 |
+-------------+----------------+---------------------+------------------+---------------+-----------+------------+-----------------+-------------+
1 row in set (0.00 sec)


mysql:root@localhost [wikidatawiki_p]> show create table wb_terms\G
*************************** 1. row ***************************
                View: wb_terms
         Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `wb_terms` AS select `wikidatawiki`.`wb_terms`.`term_row_id` AS `term_row_id`,`wikidatawiki`.`wb_terms`.`term_entity_id` AS `term_entity_id`,`wikidatawiki`.`wb_terms`.`term_full_entity_id` AS `term_full_entity_id`,`wikidatawiki`.`wb_terms`.`term_entity_type` AS `term_entity_type`,`wikidatawiki`.`wb_terms`.`term_language` AS `term_language`,`wikidatawiki`.`wb_terms`.`term_type` AS `term_type`,`wikidatawiki`.`wb_terms`.`term_text` AS `term_text`,`wikidatawiki`.`wb_terms`.`term_search_key` AS `term_search_key`,`wikidatawiki`.`wb_terms`.`term_weight` AS `term_weight` from `wikidatawiki`.`wb_terms`
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)

Not sure if we are looking at the same thing.

ok -- I was expecting this table to be present in enwiki. If it's wikidata-specific then we're probably done. @Ladsgroup can you confirm?

I just logged in and it works as expected, we should close this now. Thank you :)

Ladsgroup closed this task as Resolved.Oct 10 2017, 8:30 PM

Thank you all for sorting this out!

@Andrew For the record: yes, wb_terms is wikidata only (and wikidata test). Generally, tables with the wb_ prefix are only for wikibase repos, while the wbc_ prefix is used for tables on wikibase clients. Note that with the SDC project, commons will become a wikibase repo, and get the respective tables.

Change 357369 abandoned by Rush:
Whitelist term_full_entity_id in wb_terms table

Reason:
if we ever go back to using this it needs a complete overhaul so let's not give the illusion of up to dateness

https://gerrit.wikimedia.org/r/357369