Page MenuHomePhabricator

Refactor "user" & "user_text" fields into "actor" reference table
Open, Needs TriagePublic

Description

Subtask split out from T161671 'Compacting the revision table', to refactor the (id-or-0, name-or-address) tuples in revision, logging, image, and other tables through a common reference to an actor table (actor as in one who acts).

This makes those references a fixed size: 64-bit bigints instead of a 32-bit int followed by a variable size string. Moving the name-or-address out of the key also means we avoid duplications of the name-or-address string between multiple rows when a single user makes many edits, and make relevant indexes for by-user lookups smaller.

End state

Database new tables:

--
-- Revision authorship is represented by an actor row.
-- This avoids duplicating common usernames/IP addresses
-- in the revision table, and makes rename processing quicker.
--
CREATE TABLE /*_*/actor (
  -- Unique ID to identify each entry
  actor_id bigint unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,

  -- Key to user.user_id of the user who made this edit.
  -- Stores NULL for anonymous edits and for some mass imports.
  actor_user int unsigned,

  -- Text username or IP address of the editor.
  actor_text varchar(255) binary NOT NULL default ''
) /*$wgDBTableOptions*/;

CREATE UNIQUE INDEX /*i*/actor_user ON /*_*/actor (actor_user);
CREATE UNIQUE INDEX /*i*/actor_text ON /*_*/actor (actor_text);

Database changes tables:

  • revision adds rev_actor, and drops rev_user & rev_user_text
  • archive adds ar_actor, and drops ar_user & ar_user_text
  • image adds img_actor, and drops img_user & img_user_text
  • oldimage adds oi_actor, and drops oi_user & oi_user_text
  • filearchive adds fa_actor, and drops fa_user & fa_user_text
  • recentchanges adds rc_actor, and drops rc_user & rc_user_text
  • logging adds log_actor, and drops log_user & log_user_text

Migration state

To avoid blocking on changes to the revision table (so we can coordinate a large schema update), revision and image (FIXME: check if we need to do this on others) will gain temporary sibling tables to hold the reference:

CREATE TABLE /*_*/revision_actor_temp (
  revactor_rev bigint unsigned NOT NULL,     -- → revision.rev_id
  revactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
  PRIMARY KEY (revactor_rev, revactor_actor)
) /*$wgDBTableOptions*/;

-- will this one work sanely? we have no img_id key srsly?
CREATE TABLE /*_*/image_actor_temp (
  imgactor_img varchar(255) binary NOT NULL,     -- → image.img_name
  imgactor_actor bigint unsigned NOT NULL, -- → actor.actor_id
  PRIMARY KEY (imgactor_img, imgactor_actor)
) /*$wgDBTableOptions*/;

MediaWiki changes and migration

Add feature flag for migration states:

  • Read and write old columns only
  • Write both old and new columns. Read from new preferentially, falling back to old.
  • Write only new columns. Read from new preferentially, falling back to old.
  • Read and write the new columns only.

ActorMigration (?) class to manage state & offer methods -- coordinate the basic layout of this with our similar changes at T166732 & friends!

  • use outer joins on reads to map in the extra temporary tables when used, expose the necessary query options.
  • helper function for query setup when searching for a specific user or address's references
  • helper function for storing to the right columns and whether two inserts are needed for temp tables.

Further work:

  • find and fix uses of the direct fields for lookups to use the abstraction
  • probably want a maintenance script to do forced migration that can be run manually in production scenarios or run automatically from the default updaters for small installations

Potential pitfalls

  • There may be more direct usages than expected and they may be a pain to fix
  • There might be "surprise" usages of user_text fields or indexes for querying multiple lookups
  • apiqueryrevisions ;)
  • Are there any problems with still duplicating a username between actor.actor_name and user.user_name?
  • Migration may be harder than expected.
  • Still need to plan the 'final megamigration' from revision's friend tables to merged columns

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 418972 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418972

Change 419520 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Renameuser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/419520

Change 418982 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418982

Change 418143 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418143

Change 418978 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418978

Change 420418 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/SecurePoll@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420418

Change 420824 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Thanks@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420824

Change 420840 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Translate@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420840

Change 420842 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UniversalLanguageSelector@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420842

Change 420843 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UploadWizard@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420843

Change 420854 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/UserMerge@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420854

Change 421114 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/VisualEditor@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421114

Change 421115 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaEvents@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421115

Change 421120 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/WikimediaMaintenance@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421120

Change 421129 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/extensions/Wikibase@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421129

Change 421114 merged by jenkins-bot:
[mediawiki/extensions/VisualEditor@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421114

Change 420824 merged by jenkins-bot:
[mediawiki/extensions/Thanks@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420824

Change 418976 merged by jenkins-bot:
[mediawiki/extensions/LiquidThreads@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418976

Change 418970 merged by jenkins-bot:
[mediawiki/extensions/CleanChanges@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418970

Change 418979 merged by jenkins-bot:
[mediawiki/extensions/Nuke@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418979

Change 418983 merged by jenkins-bot:
[mediawiki/extensions/PageTriage@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418983

Change 419520 merged by jenkins-bot:
[mediawiki/extensions/Renameuser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/419520

Change 421120 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421120

Change 420843 merged by jenkins-bot:
[mediawiki/extensions/UploadWizard@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420843

Change 421115 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421115

Change 420842 merged by jenkins-bot:
[mediawiki/extensions/UniversalLanguageSelector@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420842

Change 418981 merged by jenkins-bot:
[mediawiki/extensions/OAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418981

Change 421129 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Update for the actor table change

https://gerrit.wikimedia.org/r/421129

Change 418149 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418149

Any relation as mentioned here T193254#4165780?

Any relation as mentioned here T193254#4165780?

Seems unlikely.

Change 420840 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420840

daniel added a subscriber: daniel.

Removing MCR tag, since this is orthogonal.

Change 418974 merged by jenkins-bot:
[mediawiki/extensions/FlaggedRevs@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418974

Change 418146 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418146

Change 420418 merged by jenkins-bot:
[mediawiki/extensions/SecurePoll@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420418

Change 418148 merged by jenkins-bot:
[mediawiki/extensions/CentralNotice@master] Update for the actor table change

https://gerrit.wikimedia.org/r/418148

Will this task (and/or the parent task T161671) allow us to do cross-searches between page and user? For example: "Give me the edits on Wikipedia:Village_pump_(technical) by Nirmos".

Anomie added a comment.Jun 4 2018, 4:26 PM

Will this task (and/or the parent task T161671) allow us to do cross-searches between page and user? For example: "Give me the edits on Wikipedia:Village_pump_(technical) by Nirmos".

No. The most it might do is make the SQL queries for "Give me the edits on Wikipedia:Village_pump_(technical) by Nirmos" and "Give me the edits on Wikipedia:Village_pump_(technical) by 192.0.2.34" match, instead of one probably using rev_user and the other using rev_user_text.

Also, you can do that already, at least via the API: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Wikipedia:Village%20pump%20(technical)&rvuser=Nirmos

Change 437372 had a related patch set uploaded (by Paladox; owner: Anomie):
[mediawiki/extensions/FlaggedRevs@REL1_31] Update for the actor table change

https://gerrit.wikimedia.org/r/437372

Change 420854 merged by jenkins-bot:
[mediawiki/extensions/UserMerge@master] Update for the actor table change

https://gerrit.wikimedia.org/r/420854

Change 437372 abandoned by Paladox:
Update for the actor table change

https://gerrit.wikimedia.org/r/437372

Change 472509 had a related patch set uploaded (by Paladox; owner: Anomie):
[mediawiki/extensions/CentralAuth@REL1_31] Update for the actor table change

https://gerrit.wikimedia.org/r/472509

Change 472509 merged by jenkins-bot:
[mediawiki/extensions/CentralAuth@REL1_31] Update for the actor table change

https://gerrit.wikimedia.org/r/472509

Change 481814 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: MR70):
[mediawiki/extensions/WikimediaMaintenance@master] Fix: use selectRow instead of selectField in UnsuppressCrossWiki.

https://gerrit.wikimedia.org/r/481814

Change 482110 had a related patch set uploaded (by Paladox; owner: Anomie):
[mediawiki/extensions/UserMerge@REL1_31] Update for the actor table change

https://gerrit.wikimedia.org/r/482110

Change 481814 merged by jenkins-bot:
[mediawiki/extensions/WikimediaMaintenance@master] Fix: use selectRow instead of selectField in UnsuppressCrossWiki.

https://gerrit.wikimedia.org/r/481814

Change 482110 merged by jenkins-bot:
[mediawiki/extensions/UserMerge@REL1_31] Update for the actor table change

https://gerrit.wikimedia.org/r/482110

@Pchelolo awww maaaan, we called this performer in event schemas. Now we'll have two separate names for the same thing. :(

daniel added a comment.EditedMon, Apr 8, 4:22 PM

@Pchelolo awww maaaan, we called this performer in event schemas. Now we'll have two separate names for the same thing. :(

The name comes from an RFC that was discussed two years ago, see https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2. The table with that name was merged into core in February 2018, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/380669.

The RFC process is intended to avoid such divergence... I wonder what we could do to make that work better.

Ottomata added a comment.EditedMon, Apr 8, 8:38 PM

Aye, hm not sure. The event schema fields were bikeshed with myself, Marko, Petr, Dan Andreescu, Aaron Halfakar and Joseph Allemandou as part of T134502: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need and in https://gerrit.wikimedia.org/r/#/c/mediawiki/event-schemas/+/301284/ in July of 2016. IIRC, we decided to use the performer name since that was how Mediawiki code referred to it. How did yall settle on actor?

I don't follow all RFCs, and I'm not sure if Marko or Dan were on the TechComm at the time.

daniel added a comment.Tue, Apr 9, 9:14 AM

Not sure how "actor" happened, to be honest. May have been a brainstorming during an IRC meeting.
I now realize that MW core uses "performer" in LogEntry and RecentChange. Bummer.

Yar yeah really is too bad. Neither are easy to change now. I guess we'll just ride into the sunset with the different names, like we do with article vs page?

daniel added a comment.EditedTue, Apr 9, 2:26 PM

Yar yeah really is too bad. Neither are easy to change now. I guess we'll just ride into the sunset with the different names, like we do with article vs page?

*sigh*

Probably my fault, too. "Actor" sounds like something I'd come up with ;)