Subtask split out from T161671 'Compacting the revision table', to refactor the (id-or-0, name-or-address) tuples in revision, logging, image, and other tables through a common reference to an actor table (actor as in one who acts).
This makes those references a fixed size: 64-bit bigints instead of a 32-bit int followed by a variable size string. Moving the name-or-address out of the key also means we avoid duplications of the name-or-address string between multiple rows when a single user makes many edits, and make relevant indexes for by-user lookups smaller.
End state
Database new tables:
-- -- Revision authorship is represented by an actor row. -- This avoids duplicating common usernames/IP addresses -- in the revision table, and makes rename processing quicker. -- CREATE TABLE /*_*/actor ( -- Unique ID to identify each entry actor_id bigint unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT, -- Key to user.user_id of the user who made this edit. -- Stores NULL for anonymous edits and for some mass imports. actor_user int unsigned, -- Text username or IP address of the editor. actor_name varchar(255) binary NOT NULL default '' ) /*$wgDBTableOptions*/; CREATE UNIQUE INDEX /*i*/actor_user ON /*_*/actor (actor_user); CREATE UNIQUE INDEX /*i*/actor_name ON /*_*/actor (actor_name);
Database changes tables:
- revision adds rev_actor, and drops rev_user & rev_user_text
- archive adds ar_actor, and drops ar_user & ar_user_text
- image adds img_actor, and drops img_user & img_user_text
- oldimage adds oi_actor, and drops oi_user & oi_user_text
- filearchive adds fa_actor, and drops fa_user & fa_user_text
- recentchanges adds rc_actor, and drops rc_user & rc_user_text
- logging adds log_actor, and drops log_user & log_user_text
Migration state
To avoid blocking on changes to the revision table (so we can coordinate a large schema update), revision and image (FIXME: check if we need to do this on others) will gain temporary sibling tables to hold the reference:
CREATE TABLE /*_*/revision_actor_temp ( revactor_rev bigint unsigned NOT NULL, -- → revision.rev_id revactor_actor bigint unsigned NOT NULL, -- → actor.actor_id PRIMARY KEY (revactor_rev, revactor_actor) ) /*$wgDBTableOptions*/; -- will this one work sanely? we have no img_id key srsly? CREATE TABLE /*_*/image_actor_temp ( imgactor_img varchar(255) binary NOT NULL, -- → image.img_name imgactor_actor bigint unsigned NOT NULL, -- → actor.actor_id PRIMARY KEY (imgactor_img, imgactor_actor) ) /*$wgDBTableOptions*/;
MediaWiki changes and migration
Add feature flag for migration states:
- Read and write old columns only
- Write both old and new columns. Read from new preferentially, falling back to old.
- Write only new columns. Read from new preferentially, falling back to old.
- Read and write the new columns only.
ActorMigration (?) class to manage state & offer methods -- coordinate the basic layout of this with our similar changes at T166732 & friends!
- use outer joins on reads to map in the extra temporary tables when used, expose the necessary query options.
- helper function for query setup when searching for a specific user or address's references
- helper function for storing to the right columns and whether two inserts are needed for temp tables.
Further work:
- find and fix uses of the direct fields for lookups to use the abstraction
- probably want a maintenance script to do forced migration that can be run manually in production scenarios or run automatically from the default updaters for small installations
Potential pitfalls
- There may be more direct usages than expected and they may be a pain to fix
- There might be "surprise" usages of user_text fields or indexes for querying multiple lookups
- apiqueryrevisions ;)
- Are there any problems with still duplicating a username between actor.actor_name and user.user_name?
- Migration may be harder than expected.
- Still need to plan the 'final megamigration' from revision's friend tables to merged columns