Refactor "user" & "user_text" fields into "actor" reference table
Open, Needs TriagePublic

Description

Subtask split out from T161671 'Compacting the revision table', to refactor the (id-or-0, name-or-address) tuples in revision, logging, image, and other tables through a common reference to an actor table (actor as in one who acts).

This makes those references a fixed size: 64-bit bigints instead of a 32-bit int followed by a variable size string. Moving the name-or-address out of the key also means we avoid duplications of the name-or-address string between multiple rows when a single user makes many edits, and make relevant indexes for by-user lookups smaller.

End state

Database new tables:

--
-- Revision authorship is represented by an actor row.
-- This avoids duplicating common usernames/IP addresses
-- in the revision table, and makes rename processing quicker.
--
CREATE TABLE /*_*/actor (
  -- Unique ID to identify each entry
  actor_id bigint unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,

  -- Key to user.user_id of the user who made this edit.
  -- Stores NULL for anonymous edits and for some mass imports.
  actor_user int unsigned,

  -- Text username or IP address of the editor.
  actor_text varchar(255) binary NOT NULL default ''
) /*$wgDBTableOptions*/;

CREATE UNIQUE INDEX /*i*/actor_user ON /*_*/actor (actor_user);
CREATE UNIQUE INDEX /*i*/actor_text ON /*_*/actor (actor_text);

Database changes tables:

  • revision adds rev_actor, and drops rev_user & rev_user_text
  • archive adds ar_actor, and drops ar_user & ar_user_text
  • image adds img_actor, and drops img_user & img_user_text
  • oldimage adds oi_actor, and drops oi_user & oi_user_text
  • filearchive adds fa_actor, and drops fa_user & fa_user_text
  • recentchanges adds rc_actor, and drops rc_user & rc_user_text
  • logging adds log_actor, and drops log_user & log_user_text

Migration state

To avoid blocking on changes to the revision table (so we can coordinate a large schema update), revision and image (FIXME: check if we need to do this on others) will gain temporary sibling tables to hold the reference:

CREATE TABLE /*_*/revision_actor_temp (
  revactor_rev bigint unsigned NOT NULL,     -- → revision.rev_id
  revactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
  PRIMARY KEY (revactor_rev, revactor_actor)
) /*$wgDBTableOptions*/;

-- will this one work sanely? we have no img_id key srsly?
CREATE TABLE /*_*/image_actor_temp (
  imgactor_img varchar(255) binary NOT NULL,     -- → image.img_name
  imgactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
  PRIMARY KEY (imgactor_img, imgactor_actor)
) /*$wgDBTableOptions*/;

MediaWiki changes and migration

Add feature flag for migration states:

  • Read and write old columns only
  • Write both old and new columns. Read from new preferentially, falling back to old.
  • Write only new columns. Read from new preferentially, falling back to old.
  • Read and write the new columns only.

ActorMigration (?) class to manage state & offer methods -- coordinate the basic layout of this with our similar changes at T166732 & friends!

  • use outer joins on reads to map in the extra temporary tables when used, expose the necessary query options.
  • helper function for query setup when searching for a specific user or address's references
  • helper function for storing to the right columns and whether two inserts are needed for temp tables.

Further work:

  • find and fix uses of the direct fields for lookups to use the abstraction
  • probably want a maintenance script to do forced migration that can be run manually in production scenarios or run automatically from the default updaters for small installations

Potential pitfalls

  • There may be more direct usages than expected and they may be a pain to fix
  • There might be "surprise" usages of user_text fields or indexes for querying multiple lookups
  • apiqueryrevisions ;)
  • Are there any problems with still duplicating a username between actor.actor_name and user.user_name?
  • Migration may be harder than expected.
  • Still need to plan the 'final megamigration' from revision's friend tables to merged columns

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 365894 abandoned by Anomie:
[WIP] Early versions of 'actor' column migration

Reason:
See I8d825eb02c69cc66d90bd41325133fd3f99f0226

https://gerrit.wikimedia.org/r/365894

cscott added a subscriber: cscott.Oct 31 2017, 3:07 PM

Hey, if we're making changes here, is there anything obvious that could be done to enable "multi-actor" revisions, like we might get from real-time collab?

I'm not familiar enough with the database schema to offer a concrete suggestion here, just mentioning that it might be worth thinking ahead on.

Nothing obvious, no. To be properly indexable at the database level, such a scheme would require denormalization along the lines of what's being done in the temporary table at https://gerrit.wikimedia.org/r/#/c/380669/6/maintenance/tables.sql@459. And then pretty much every consumer of the database would need to be updated to handle the possibility of having more than one row per revision, both on the back end and in UI, which would likely be a project as large as the current Multi-Content Revisions project.

Nirmos added a subscriber: Nirmos.Nov 30 2017, 6:07 PM

Is this a good time to look at T146595? Even if it isn't, I thought it would be a good idea to let you know about the bug, so that there are no nasty surprises.

He7d3r added a subscriber: He7d3r.Dec 18 2017, 3:38 PM

With the actor table, T27377 will need to be updated. This means that we'll then have 'af_actor', 'afh_actor', and 'afl_actor' instead of 'af_user', 'af_user_text', 'afh_user', 'afh_user_text', 'afl_user', and 'afl_user_text'.

Change 380669 merged by jenkins-bot:
[mediawiki/core@master] Add actor table and code to start using it

https://gerrit.wikimedia.org/r/380669

Change 418143 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/AbuseFilter@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418143

Change 418146 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CentralAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418146

Change 418148 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CentralNotice@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418148

Change 418149 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CheckUser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418149

Change 418970 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CleanChanges@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418970

Change 418972 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/ContentTranslation@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418972

Change 418973 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Echo@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418973

Change 418974 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/FlaggedRevs@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418974

Change 418975 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Flow@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418975

Change 418976 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/LiquidThreads@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418976

Change 418978 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/MobileFrontend@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418978

Change 418979 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Nuke@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418979

Change 418981 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/OAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418981

Change 418982 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/ORES@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418982

Change 418983 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/PageTriage@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418983

Change 418975 merged by jenkins-bot:
[mediawiki/extensions/Flow@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418975

Change 418973 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418973

Change 418972 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418972

Change 419520 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Renameuser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/419520

Change 418982 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418982

Change 418143 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418143

Change 418978 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418978

Change 420418 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/SecurePoll@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420418

Change 420824 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Thanks@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420824

Change 420840 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Translate@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420840

Change 420842 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UniversalLanguageSelector@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420842

Change 420843 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UploadWizard@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420843

Change 420854 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UserMerge@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420854

Change 421114 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/VisualEditor@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421114

Change 421115 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaEvents@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421115

Change 421120 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaMaintenance@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421120

Change 421129 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Wikibase@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421129

Change 421114 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421114

Change 420824 merged by jenkins-bot:
[mediawiki/extensions/Thanks@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420824

Change 418976 merged by jenkins-bot:
[mediawiki/extensions/LiquidThreads@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418976

Change 418970 merged by jenkins-bot:
[mediawiki/extensions/CleanChanges@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418970

Change 418979 merged by jenkins-bot:
[mediawiki/extensions/Nuke@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418979

Change 418983 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418983

Change 419520 merged by jenkins-bot:
[mediawiki/extensions/Renameuser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/419520

Change 421120 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421120

Change 420843 merged by jenkins-bot:
[mediawiki/extensions/UploadWizard@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420843