Subtask split out from T161671 'Compacting the revision table', to refactor the (id-or-0, name-or-address) tuples in `revision`, `logging`, `image`, and other tables through a common reference to an `actor` table (actor as in one who acts).
This makes those references a fixed size: 64-bit `bigint`s instead of a 32-bit `int` followed by a variable size string. Moving the name-or-address out of the key also means we avoid duplications of the name-or-address string between multiple rows when a single user makes many edits, and make relevant indexes for by-user lookups smaller.
# End state
Database new tables:
-- Revision authorship is represented by an actor row.
-- This avoids duplicating common usernames/IP addresses
-- in the revision table, and makes rename processing quicker.
CREATE TABLE /*_*/actor (
-- Unique ID to identify each entry
actor_id bigint unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
-- Key to user.user_id of the user who made this edit.
-- Stores NULL for anonymous edits and for some mass imports.
actor_user int unsigned,
-- Text username or IP address of the editor.
actor_text varchar(255) binary NOT NULL default ''
CREATE UNIQUE INDEX /*i*/actor_user ON /*_*/actor (actor_user);
CREATE UNIQUE INDEX /*i*/actor_text ON /*_*/actor (actor_text);
Database changes tables:
* `revision` adds `rev_actor`, and drops `rev_user` & `rev_user_text`
* `archive` adds `ar_actor`, and drops `ar_user` & `ar_user_text`
* `image` adds `img_actor`, and drops `img_user` & `img_user_text`
* `oldimage` adds `oi_actor`, and drops `oi_user` & `oi_user_text`
* `filearchive` adds `fa_actor`, and drops `fa_user` & `fa_user_text`
* `recentchanges` adds `rc_actor`, and drops `rc_user` & `rc_user_text`
* `logging` adds `log_actor`, and drops `log_user` & `log_user_text`
# Migration state
To avoid blocking on changes to the revision table (so we can coordinate a large schema update), `revision` and `image` (FIXME: check if we need to do this on others) will gain temporary sibling tables to hold the reference:
CREATE TABLE /*_*/revision_actor_temp (
revactor_rev bigint unsigned NOT NULL, -- → revision.rev_id
revactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
PRIMARY KEY (revactor_rev, revactor_actor)
-- will this one work sanely? we have no img_id key srsly?
CREATE TABLE /*_*/image_actor_temp (
imgactor_img varchar(255) binary NOT NULL, -- → image.img_name
imgactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
PRIMARY KEY (imgactor_img, imgactor_actor)
# MediaWiki changes and migration
Add feature flag for migration states:
* Read and write old columns only
* Write both old and new columns. Read from new preferentially, falling back to old.
* Write only new columns. Read from new preferentially, falling back to old.
* Read and write the new columns only.
`ActorMigration` (?) class to manage state & offer methods -- coordinate the basic layout of this with our similar changes at T166732 & friends!
* use outer joins on reads to map in the extra temporary tables when used, expose the necessary query options.
* helper function for query setup when searching for a specific user or address's references
* helper function for storing to the right columns and whether two inserts are needed for temp tables.
* find and fix uses of the direct fields for lookups to use the abstraction
* probably want a maintenance script to do forced migration that can be run manually in production scenarios or run automatically from the default updaters for small installations
# Potential pitfalls
* There may be more direct usages than expected and they may be a pain to fix
* There might be "surprise" usages of `user_text` fields or indexes for querying multiple lookups
* apiqueryrevisions ;)
* Are there any problems with still duplicating a username between `actor.actor_name` and `user.user_name`?
* Migration may be harder than expected.
* Still need to plan the 'final megamigration' from revision's friend tables to merged columns