Page MenuHomePhabricator

Denormalize user_groups to contain actor information
Closed, DeclinedPublic

Description

As you recall, T223406 resulted in the need for additional joins, which made many queries slower. I have a proposal: tables that contain user information should be denormalized on the replicated copy on Cloud-Services such that they also contain the relevant actor information. As a start, I am proposing the user_groups table should be denormalized as such.

Current table structure

+-----------+------------------+------+-----+---------+-------+
| Field     | Type             | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+-------+
| ug_user   | int(10) unsigned | NO   | PRI | 0       |       |
| ug_group  | varbinary(255)   | NO   | PRI |         |       |
| ug_expiry | varbinary(14)    | YES  | MUL | NULL    |       |
+-----------+------------------+------+-----+---------+-------+

Proposed structure (on Cloud-Services only)

+-----------+---------------------+------+-----+---------+-------+
| Field     | Type                | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| ug_user   | int(10) unsigned    | NO   | PRI | 0       |       |
| ug_group  | varbinary(255)      | NO   | PRI |         |       |
| ug_expiry | varbinary(14)       | YES  | MUL | NULL    |       |
| ug_actor  | bigint(20) unsigned | NO   |     | NULL    |       |
+-----------+---------------------+------+-----+---------+-------+

Justification

The user_groups table does not change very often, therefore, modifying the ETL process such that it would also contain the actor ID would not be too costly.

Adding the actor ID here makes it easier to join this table with other tables that contain actor info (e.g. the revision table) without having to go through a join with the very large actor table).

Event Timeline

fnegri triaged this task as Low priority.Jul 1 2024, 9:57 AM

I've set this to "Low" priority as there was no activity on this task since 2019. Happy to increase the priority if more users would find it useful.

Such change is meaningless as long as copy in cloud replica are just views instead of real copy (or materialized views) - so any queries on such "denormalized" column just query actor tables on-the-fly. If we have materialized views we will first resolve T215445: comment and actor view challenges for Cloud Services.

taavi subscribed.

I'm declining since maintaining additional "real" columns would be very complicated for the relatively little benefit they give.