Page MenuHomePhabricator

Denormalize user_groups to contain actor information
Open, Needs TriagePublic

Description

As you recall, T223406 resulted in the need for additional joins, which made many queries slower. I have a proposal: tables that contain user information should be denormalized on the replicated copy on Cloud-Services such that they also contain the relevant actor information. As a start, I am proposing the user_groups table should be denormalized as such.

Current table structure

+-----------+------------------+------+-----+---------+-------+
| Field     | Type             | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+-------+
| ug_user   | int(10) unsigned | NO   | PRI | 0       |       |
| ug_group  | varbinary(255)   | NO   | PRI |         |       |
| ug_expiry | varbinary(14)    | YES  | MUL | NULL    |       |
+-----------+------------------+------+-----+---------+-------+

Proposed structure (on Cloud-Services only)

+-----------+---------------------+------+-----+---------+-------+
| Field     | Type                | Null | Key | Default | Extra |
+-----------+---------------------+------+-----+---------+-------+
| ug_user   | int(10) unsigned    | NO   | PRI | 0       |       |
| ug_group  | varbinary(255)      | NO   | PRI |         |       |
| ug_expiry | varbinary(14)       | YES  | MUL | NULL    |       |
| ug_actor  | bigint(20) unsigned | NO   |     | NULL    |       |
+-----------+---------------------+------+-----+---------+-------+

Justification

The user_groups table does not change very often, therefore, modifying the ETL process such that it would also contain the actor ID would not be too costly.

Adding the actor ID here makes it easier to join this table with other tables that contain actor info (e.g. the revision table) without having to go through a join with the very large actor table).