Page MenuHomePhabricator

Remove revision_comment_temp and revision_actor_temp
Open, Needs TriagePublic

Description

Due to expected difficulty in altering the revision table (see T161671: Compacting the revision table), the actor and comment migrations created tables named revision_comment_temp and revision_actor_temp to relate the rev_id and the comment_id/actor_id, with the plan being to eventually alter revision to contain rev_comment_id and rev_actor columns and drop those relation tables. This task tracks that "eventually".

  • 0. Wait for the main actor and comment migration to be done (except for DBA work).
    • 0.1. Comment migration, through steps 7.1 and 7.2 (T166733)
    • 0.2. Actor migration, through steps 7.1 and 7.2 (T188327)
  • 1. Alter revision to drop rev_comment, rev_user, and rev_user_text and add rev_comment_id and rev_actor. (T161671)
    • 1.1. Write and merge the MediaWiki patch.
    • 1.2. Have DBAs perform the change.
  • 2. Introduce a migration from using revision_comment_temp and revision_actor_temp (old) to rev_comment_id and rev_actor (new).
    • 2.1. Write and merge the MediaWiki patch, assuming it wasn't included in 1.1.
    • 2.2. Check for extensions assuming the temp tables exist.
  • 3. Set migration stage to write-both/read-old.
  • 4. Run the maintenance script to backfill rev_comment_id and rev_actor.
  • 5. Set migration stage to write-both/read-new.
    • 5.1. Announce changes to wikitech-l and cloud.
    • 5.2. Update WMCS views to read the new schema.
  • 6. Set migration stage to write-new/read-new.
  • 7. Remove old tables and code.

Related Objects

Event Timeline

Status: Blocked on T184615: Once MCR is deployed, drop the rev_text_id, rev_content_model, and rev_content_format fields from the revision table also being ready to go. The intention is to combine all the alters of revision into one task for the DBAs.

Nuria added a subscriber: Nuria.

This is going to affect Analytics mediawiki reconstruction, tagging

@Nuria : I don't think we use the tables mentioned here. Labs might use them in the background, but views are already maintained with 'the new' schema (@Milimetric can you confim?)

I'm sorry I thought I confirmed this - yes, we were blissfully ignorant on top of the cloud db views. Our other private sqoops don't use these tables either.

Change 552339 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Alter revision for actor, comment, and MCR

https://gerrit.wikimedia.org/r/552339

Change 552339 merged by jenkins-bot:
[mediawiki/core@master] Alter revision for actor, comment, and MCR

https://gerrit.wikimedia.org/r/552339

Next steps:

  1. Write to rev_actor and rev_comment_id
  2. Backfill rev_actor and rev_comment_id
  3. have a read-new mode, at least for testing
  • We won’t get through with the migration before the 1.36 branch (March?)
    • We should avoid supporting multiple modes in a release
  • But we can start writing to the new schema, and we can also already backfill.
  • The Updater to 1.36 should also backfill.

@Marostegui is it correct that step 1.2 of this task is already completed by DBAs? Can we mark step 1 done?

Correct, this is how the table looks like now:

root@cumin1001:/home/marostegui# mysql.py -hdb1169 enwiki -e "show create table revision\G"
*************************** 1. row ***************************
       Table: revision
Create Table: CREATE TABLE `revision` (
  `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT,
  `rev_page` int(8) unsigned NOT NULL DEFAULT 0,
  `rev_comment_id` bigint(20) unsigned NOT NULL DEFAULT 0,
  `rev_actor` bigint(20) unsigned NOT NULL DEFAULT 0,
  `rev_timestamp` varbinary(14) NOT NULL DEFAULT '',
  `rev_minor_edit` tinyint(1) unsigned NOT NULL DEFAULT 0,
  `rev_deleted` tinyint(1) unsigned NOT NULL DEFAULT 0,
  `rev_len` int(8) unsigned DEFAULT NULL,
  `rev_parent_id` int(8) unsigned DEFAULT NULL,
  `rev_sha1` varbinary(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`rev_id`),
  KEY `rev_timestamp` (`rev_timestamp`),
  KEY `page_timestamp` (`rev_page`,`rev_timestamp`),
  KEY `rev_page_id` (`rev_page`,`rev_id`),
  KEY `rev_actor_timestamp` (`rev_actor`,`rev_timestamp`,`rev_id`),
  KEY `rev_page_actor_timestamp` (`rev_page`,`rev_actor`,`rev_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=1009766990 DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED