Page MenuHomePhabricator

Create wiki replicas views for globaljsonlinks tables
Open, Stalled, MediumPublic

Description

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Bugreporter : It's not very clear what you are expecting here. Could you provide more details so we can understand what needs to be done without too much investigation?

Currently there is no view of these three tables in Cloud, so they are not queryable in Cloud.

taavi renamed this task from Create views for globaljsonlinks tables to Create wiki replicas views for globaljsonlinks tables.Apr 4 2025, 9:18 AM
taavi added a project: Data-Services.
taavi moved this task from Backlog to Wiki replicas on the Data-Services board.

@Ahoelzl : could you validate if those tables should or should not be exposed? Is redaction needed?

@Bugreporter thanks for filing. Can you elaborate on the use cases? And also on priority?

I think that this is likely to be uncontentious and relatively easy to achieve.
The three tables mentioned appear to be related in some way to the use of the Charts extension, which has extended the JsonConfig model.
We can see more details in this ticket: T380226: Install globaljsonlinks* tables on X1 for use with commons commons for Charts deployment although I'm not sure whether there is a better source for information.

I had a quick look at one of the databases where this is installed, commonswiki and the tables are installed in the x1 section, since they relate to an extension.

btullis@stat1008:~$ analytics-mysql --use-x1 commonswiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8035121
Server version: 10.6.20-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:research@dbstore1009.eqiad.wmnet [commonswiki]> show tables;
+---------------------------------------+
| Tables_in_commonswiki                 |
+---------------------------------------+
| echo_email_batch                      |
| echo_event                            |
| echo_notification                     |
| echo_target_page                      |
| globaljsonlinks                       |
| globaljsonlinks_target                |
| globaljsonlinks_wiki                  |
| mediamoderation_scan                  |
| translate_cache                       |
| translate_message_group_subscriptions |
+---------------------------------------+
10 rows in set (0.001 sec)

mysql:research@dbstore1009.eqiad.wmnet [commonswiki]> describe globaljsonlinks;
+---------------+---------------------+------+-----+---------+-------+
| Field         | Type                | Null | Key | Default | Extra |
+---------------+---------------------+------+-----+---------+-------+
| gjl_wiki      | int(11)             | NO   | PRI | NULL    |       |
| gjl_namespace | int(11)             | NO   | PRI | NULL    |       |
| gjl_title     | varbinary(255)      | NO   | PRI | NULL    |       |
| gjl_target    | bigint(20) unsigned | NO   | PRI | NULL    |       |
+---------------+---------------------+------+-----+---------+-------+
4 rows in set (0.002 sec)

mysql:research@dbstore1009.eqiad.wmnet [commonswiki]> describe globaljsonlinks_target;
+----------------+---------------------+------+-----+---------+----------------+
| Field          | Type                | Null | Key | Default | Extra          |
+----------------+---------------------+------+-----+---------+----------------+
| gjlt_id        | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| gjlt_namespace | int(11)             | NO   | MUL | NULL    |                |
| gjlt_title     | varbinary(255)      | NO   |     | NULL    |                |
+----------------+---------------------+------+-----+---------+----------------+
3 rows in set (0.001 sec)

mysql:research@dbstore1009.eqiad.wmnet [commonswiki]> describe globaljsonlinks_wiki;
+---------------------+------------------+------+-----+---------+----------------+
| Field               | Type             | Null | Key | Default | Extra          |
+---------------------+------------------+------+-----+---------+----------------+
| gjlw_id             | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| gjlw_wiki           | varbinary(32)    | NO   | MUL | NULL    |                |
| gjlw_namespace      | int(11)          | YES  |     | NULL    |                |
| gjlw_namespace_text | varbinary(255)   | YES  |     | NULL    |                |
+---------------------+------------------+------+-----+---------+----------------+
4 rows in set (0.001 sec)

mysql:research@dbstore1009.eqiad.wmnet [commonswiki]>

From an eyeball of the schemas, I think that there is unlikely to be any need for redaction of data.

We can also see another table which is similarly named: globalimagelinks

btullis@stat1008:~$ analytics-mysql commonswiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 43933951
Server version: 10.6.20-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]> describe globalimagelinks;
+-----------------------+------------------+------+-----+---------+-------+
| Field                 | Type             | Null | Key | Default | Extra |
+-----------------------+------------------+------+-----+---------+-------+
| gil_wiki              | varbinary(32)    | NO   | PRI |         |       |
| gil_page              | int(10) unsigned | NO   | PRI | 0       |       |
| gil_page_namespace_id | int(11)          | NO   |     | 0       |       |
| gil_page_namespace    | varbinary(255)   | NO   |     |         |       |
| gil_page_title        | varbinary(255)   | NO   |     |         |       |
| gil_to                | varbinary(255)   | NO   | PRI |         |       |
+-----------------------+------------------+------+-----+---------+-------+
6 rows in set (0.002 sec)

mysql:research@dbstore1007.eqiad.wmnet [commonswiki]>

This can be found in the fullviews section of the maintain-views.yaml data file.
https://github.com/wikimedia/operations-puppet/blob/production/modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml#L85

So, while I would like a second opinion from someone who understands the JsonConfig model and the Charts extension better, I would say that this request is reasonable, from both a privacy and technical viewpoint.

Gehel triaged this task as Medium priority.May 16 2025, 8:57 AM
Gehel moved this task from Incoming to Scratch on the Data-Platform-SRE board.
Gehel added a subscriber: EBernhardson.

@EBernhardson do you know enough about JsonConfig (and maybe charts) to validate Ben's analysis above?

I don't know a ton about this, but i took a look at it. A few thoughts:

  • x1 doesn't really mean it's about extensions, but typically that means more that these are tables shared between wikis. Normal per-wiki tables are on the s* db clusters, the x* clusters are typically shared between wikis.
  • Plausibly the view that exposes these tables publicly may need to filter for when wiki is a private wiki? I took a look over select distinct gjlw_wiki from globaljsonlinks_wiki and didn't notice any private wikis, but not clear if that's by accident or by design.
  • The gjl_wiki column of globaljsonlinks looks to be a foreign key to globaljsonlinks_wiki. I imagine if we filter the private wiki from globaljsonlinks_wiki then matching rows in globaljsonlinks also have to be elided?

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

fnegri subscribed.

@Gehel I don't think there's anything actionable for cloud-services-team yet. I'm looping in Data-Persistence as I think we would need to add x1 to the replicated sections. Currently Wiki Replicas only include s1 to s8, plus x3.

taavi changed the task status from Open to Stalled.Jun 3 2025, 9:09 AM
taavi added a project: Data-Services.
taavi subscribed.

(replicating x3 was already recently asked in T395072, but likely should have its own task?)

Does anybody have a specific use case that needs to access these tables? If yes, that might help in bumping the priority of T395881: Set up x1 replication to Wiki Replicas.

It would be very helpful to have access to these tables on toolforge (and quarry, https://superset.wmcloud.org/ etc) to generate statistics and usage information about charts.