Page MenuHomePhabricator

Privacy review of x1 tables in preparation of adding them to wikireplicas
Open, In Progress, LowPublic

Description

There has been a long-standing request to set x1 replication to wikireplicas, meaning make them publicly visible to the world so anyone can query them (via tools such as quarry.wmflabs.org or similar tools, see https://wikitech.wikimedia.org/wiki/Help:Wiki_Replicas) but many of these tables contain PII or copyrighted material and shouldn't be visible to the world without a view hiding them (see https://gerrit.wikimedia.org/g/operations/puppet/+/production/modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml for the list of existing ones)

So we need the privacy engineering team to:

  • Officially check and sign off that tables existing in x1 that have been listed as private or public in the table catalog are correctly marked as such
  • For "partially public" tables, build the proper view hiding private data.

Details

Event Timeline

sbassett changed the task status from Open to In Progress.Jan 26 2026, 5:31 PM
sbassett assigned this task to Rsilvola.
sbassett triaged this task as Low priority.
sbassett moved this task from Incoming to In Progress on the Privacy Engineering board.
sbassett added a project: SecTeam-Processed.

Right now these tables exist in x1:

campaign_events
ce_address
ce_event_address
ce_event_contributions
ce_event_questions
ce_event_topics
ce_event_wikis
ce_invitation_lists
ce_invitation_list_users
ce_organizers
ce_participants
ce_question_aggregation
ce_question_answers
ce_tracking_tools
ce_worklist_articles
communityrequests_counters
communityrequests_entities
communityrequests_tags
communityrequests_translations
cusi_case
cusi_signal
cusi_user
echo_email_batch
echo_event
echo_notification
echo_push_provider
echo_push_subscription
echo_push_topic
echo_target_page
globaljsonlinks
globaljsonlinks_target
globaljsonlinks_wiki
growthexperiments_link_recommendations
growthexperiments_link_submissions
growthexperiments_mentee_data
growthexperiments_mentor_mentee
growthexperiments_user_impact
mediamoderation_scan
T417172_growthexperiments_link_recommendations
T417172_growthexperiments_link_submissions
T417172_growthexperiments_mentee_data
T417172_growthexperiments_mentor_mentee
T417172_growthexperiments_user_impact
translate_cache
translate_message_group_subscriptions
wikimedia_campaign_events_grant

You can get more information about them by checking the table catalog (https://config-master.wikimedia.org/mediawiki-tables.json). I‌ made a GUI‌ for it long time ago:‌ https://going-merry.toolforge.org/

The tables above starting with T417172_* look a bit odd. I'm not sure what database those are in. I don't see it in use wikishared; show tables;, use enwiki; show tables; use eswiki; show tables;, mediawiki-tables.json, or tables-catalog.yaml.

In x1, I think there is a database for every wiki (1000-ish), plus the databases cognate_wiktionary, flowdb, and wikishared. Looks like the tables listed above are tables that can occur in wiki databases? Worth noting that wiki databases often have only a subset of those tables depending on which extensions are installed.

I would recommend any effort to add x1 to the wiki replicas also include cognate_wiktionary, flowdb, and wikishared.

Great catch. I got all dbs in all.dblist. I didn't check the tables in other dbs (wikishared, flowdb, cognate_wiktionary). T417172_* have been already dropped. Check the ticket.

These databases were not in all.dblist:

ladsgroup@stat1009:~$ diff x1_dbs all.dblist
1d0
< Database
175d173
< cognate_wiktionary
302d299
< flowdb
434d430
< information_schema
1029d1024
< wikishared

These tables also need to be checked:

mysql:research@dbstore1009.eqiad.wmnet [enwiki]> use cognate_wiktionary;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql:research@dbstore1009.eqiad.wmnet [cognate_wiktionary]> show tables;
+------------------------------+
| Tables_in_cognate_wiktionary |
+------------------------------+
| cognate_pages                |
| cognate_sites                |
| cognate_titles               |
+------------------------------+
3 rows in set (0.009 sec)



mysql:research@dbstore1009.eqiad.wmnet [cognate_wiktionary]> use flowdb;
\Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql:research@dbstore1009.eqiad.wmnet [flowdb]> show tables;
+------------------------+
| Tables_in_flowdb       |
+------------------------+
| flow_definition        |
| flow_ext_ref           |
| flow_revision          |
| flow_topic_list        |
| flow_tree_node         |
| flow_tree_revision     |
| flow_wiki_ref          |
| flow_wiki_ref_deleteme |
| flow_workflow          |
+------------------------+
9 rows in set (0.003 sec)


Database changed
mysql:research@dbstore1009.eqiad.wmnet [wikishared]> show tables;
+---------------------------------+
| Tables_in_wikishared            |
+---------------------------------+
| bounce_records                  |
| campaign_events                 |
| ce_address                      |
| ce_event_address                |
| ce_event_contributions          |
| ce_event_questions              |
| ce_event_topics                 |
| ce_event_wikis                  |
| ce_invitation_list_users        |
| ce_invitation_lists             |
| ce_organizers                   |
| ce_participants                 |
| ce_question_aggregation         |
| ce_question_answers             |
| ce_tracking_tools               |
| ce_worklist_articles            |
| cuci_temp_edit                  |
| cuci_user                       |
| cuci_wiki_map                   |
| cx_corpora                      |
| cx_lists                        |
| cx_notification_log             |
| cx_section_translations         |
| cx_significant_edits            |
| cx_suggestions                  |
| cx_translations                 |
| cx_translators                  |
| echo_push_provider              |
| echo_push_subscription          |
| echo_push_topic                 |
| echo_unread_wikis               |
| loginnotify_seen_net            |
| reading_list                    |
| reading_list_entry              |
| reading_list_project            |
| urlshortcodes                   |
| wikimedia_campaign_events_grant |
+---------------------------------+
37 rows in set (0.002 sec)

We can probably ignore flowdb as it's being undeployed but the rest need checking.

Change #1247549 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add x1 section to an-redacteddb1001

https://gerrit.wikimedia.org/r/1247549

Change #1247549 merged by Btullis:

[operations/puppet@production] Add x1 section to an-redacteddb1001

https://gerrit.wikimedia.org/r/1247549