Refactor "user" & "user_text" fields into "actor" reference table
Open, Needs TriagePublic

Description

Subtask split out from T161671 'Compacting the revision table', to refactor the (id-or-0, name-or-address) tuples in revision, logging, image, and other tables through a common reference to an actor table (actor as in one who acts).

This makes those references a fixed size: 64-bit bigints instead of a 32-bit int followed by a variable size string. Moving the name-or-address out of the key also means we avoid duplications of the name-or-address string between multiple rows when a single user makes many edits, and make relevant indexes for by-user lookups smaller.

End state

Database new tables:

--
-- Revision authorship is represented by an actor row.
-- This avoids duplicating common usernames/IP addresses
-- in the revision table, and makes rename processing quicker.
--
CREATE TABLE /*_*/actor (
  -- Unique ID to identify each entry
  actor_id bigint unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,

  -- Key to user.user_id of the user who made this edit.
  -- Stores NULL for anonymous edits and for some mass imports.
  actor_user int unsigned,

  -- Text username or IP address of the editor.
  actor_text varchar(255) binary NOT NULL default ''
) /*$wgDBTableOptions*/;

CREATE UNIQUE INDEX /*i*/actor_user ON /*_*/actor (actor_user);
CREATE UNIQUE INDEX /*i*/actor_text ON /*_*/actor (actor_text);

Database changes tables:

  • revision adds rev_actor, and drops rev_user & rev_user_text
  • archive adds ar_actor, and drops ar_user & ar_user_text
  • image adds img_actor, and drops img_user & img_user_text
  • oldimage adds oi_actor, and drops oi_user & oi_user_text
  • filearchive adds fa_actor, and drops fa_user & fa_user_text
  • recentchanges adds rc_actor, and drops rc_user & rc_user_text
  • logging adds log_actor, and drops log_user & log_user_text

Migration state

To avoid blocking on changes to the revision table (so we can coordinate a large schema update), revision and image (FIXME: check if we need to do this on others) will gain temporary sibling tables to hold the reference:

CREATE TABLE /*_*/revision_actor_temp (
  revactor_rev bigint unsigned NOT NULL,     -- → revision.rev_id
  revactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
  PRIMARY KEY (revactor_rev, revactor_actor)
) /*$wgDBTableOptions*/;

-- will this one work sanely? we have no img_id key srsly?
CREATE TABLE /*_*/image_actor_temp (
  imgactor_img varchar(255) binary NOT NULL,     -- → image.img_name
  imgactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
  PRIMARY KEY (imgactor_img, imgactor_actor)
) /*$wgDBTableOptions*/;

MediaWiki changes and migration

Add feature flag for migration states:

  • Read and write old columns only
  • Write both old and new columns. Read from new preferentially, falling back to old.
  • Write only new columns. Read from new preferentially, falling back to old.
  • Read and write the new columns only.

ActorMigration (?) class to manage state & offer methods -- coordinate the basic layout of this with our similar changes at T166732 & friends!

  • use outer joins on reads to map in the extra temporary tables when used, expose the necessary query options.
  • helper function for query setup when searching for a specific user or address's references
  • helper function for storing to the right columns and whether two inserts are needed for temp tables.

Further work:

  • find and fix uses of the direct fields for lookups to use the abstraction
  • probably want a maintenance script to do forced migration that can be run manually in production scenarios or run automatically from the default updaters for small installations

Potential pitfalls

  • There may be more direct usages than expected and they may be a pain to fix
  • There might be "surprise" usages of user_text fields or indexes for querying multiple lookups
  • apiqueryrevisions ;)
  • Are there any problems with still duplicating a username between actor.actor_name and user.user_name?
  • Migration may be harder than expected.
  • Still need to plan the 'final megamigration' from revision's friend tables to merged columns

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 380669 merged by jenkins-bot:
[mediawiki/core@master] Add actor table and code to start using it

https://gerrit.wikimedia.org/r/380669

Change 418143 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/AbuseFilter@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418143

Change 418146 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CentralAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418146

Change 418148 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CentralNotice@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418148

Change 418149 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CheckUser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418149

Change 418970 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/CleanChanges@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418970

Change 418972 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/ContentTranslation@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418972

Change 418973 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Echo@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418973

Change 418974 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/FlaggedRevs@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418974

Change 418975 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Flow@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418975

Change 418976 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/LiquidThreads@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418976

Change 418978 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/MobileFrontend@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418978

Change 418979 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Nuke@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418979

Change 418981 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/OAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418981

Change 418982 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/ORES@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418982

Change 418983 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/PageTriage@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418983

Change 418975 merged by jenkins-bot:
[mediawiki/extensions/Flow@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418975

Change 418973 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418973

Change 418972 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418972

Change 419520 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Renameuser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/419520

Change 418982 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418982

Change 418143 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418143

Change 418978 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418978

Change 420418 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/SecurePoll@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420418

Change 420824 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Thanks@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420824

Change 420840 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Translate@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420840

Change 420842 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UniversalLanguageSelector@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420842

Change 420843 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UploadWizard@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420843

Change 420854 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UserMerge@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420854

Change 421114 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/VisualEditor@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421114

Change 421115 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaEvents@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421115

Change 421120 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaMaintenance@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421120

Change 421129 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Wikibase@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421129

Change 421114 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421114

Change 420824 merged by jenkins-bot:
[mediawiki/extensions/Thanks@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420824

Change 418976 merged by jenkins-bot:
[mediawiki/extensions/LiquidThreads@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418976

Change 418970 merged by jenkins-bot:
[mediawiki/extensions/CleanChanges@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418970

Change 418979 merged by jenkins-bot:
[mediawiki/extensions/Nuke@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418979

Change 418983 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418983

Change 419520 merged by jenkins-bot:
[mediawiki/extensions/Renameuser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/419520

Change 421120 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421120

Change 420843 merged by jenkins-bot:
[mediawiki/extensions/UploadWizard@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420843

Change 421115 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421115

Change 420842 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420842

Change 418981 merged by jenkins-bot:
[mediawiki/extensions/OAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418981

Change 421129 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421129

Change 418149 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418149

Any relation as mentioned here T193254#4165780?

Any relation as mentioned here T193254#4165780?

Seems unlikely.

Change 420840 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420840

daniel added a subscriber: daniel.

Removing MCR tag, since this is orthogonal.

Change 418974 merged by jenkins-bot:
[mediawiki/extensions/FlaggedRevs@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418974

Change 418146 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418146

Change 420418 merged by jenkins-bot:
[mediawiki/extensions/SecurePoll@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420418

Change 418148 merged by jenkins-bot:
[mediawiki/extensions/CentralNotice@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418148

Will this task (and/or the parent task T161671) allow us to do cross-searches between page and user? For example: "Give me the edits on Wikipedia:Village_pump_(technical) by Nirmos".

Anomie added a comment.Jun 4 2018, 4:26 PM

Will this task (and/or the parent task T161671) allow us to do cross-searches between page and user? For example: "Give me the edits on Wikipedia:Village_pump_(technical) by Nirmos".

No. The most it might do is make the SQL queries for "Give me the edits on Wikipedia:Village_pump_(technical) by Nirmos" and "Give me the edits on Wikipedia:Village_pump_(technical) by 192.0.2.34" match, instead of one probably using rev_user and the other using rev_user_text.

Also, you can do that already, at least via the API: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Wikipedia:Village%20pump%20(technical)&rvuser=Nirmos

Change 437372 had a related patch set uploaded (by Paladox; owner: Anomie):
[mediawiki/extensions/FlaggedRevs@REL1_31] Update for the actor table change

https://gerrit.wikimedia.org/r/437372

Change 420854 merged by jenkins-bot:
[mediawiki/extensions/UserMerge@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420854

Change 437372 abandoned by Paladox:
Update for the actor table change

https://gerrit.wikimedia.org/r/437372