Page MenuHomePhabricator

Deploy refactored comment storage
Closed, ResolvedPublic

Description

The high-level checklist:

  • 0. Set the configuration flag to "old" (see T166732) (not needed, default is now "old")
  • 1. Merge the first patch for T166732: Refactor comment storage in the database and abstract access in MediaWiki (adding new schemas and code)
  • 2. Perform schema change (T174569)
  • 3. Turn the feature flag to "read+write both". See if stuff breaks.
  • 3.1. Announce the pending change to wikitech-l@ and cloud@, and give time for people to update.
  • 3.2. Make sure all deployed extensions are updated.
  • 4. Turn the feature flag to "write new only, read both". See if stuff breaks.
  • 5. Run the maintenance script(s) to migrate all the old stuff to new stuff, blanking the old stuff in the process.
  • 6. Turn the feature flag to "new only".
    • Default and CI
    • Beta Cluster
    • Group 0
    • Group 1
    • Group 2
  • 7. Remove old schemas and code
    • 7.1. Update WMCS replicas to no longer reference old schemas (T212972).
    • 7.2. Write and merge patches to remove $wgCommentTableSchemaMigrationStage, supporting code, and old schemas.
    • 7.3. Submit Schema-change task for WMF production (T233135).

For the cleaning up of revision_comment_temp, see T215466. This and T188327 both block T161671, which in turn is the first step of that task.

Details

SubjectRepoBranchLines +/-
mediawiki/coremaster+853 -519
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+1 -0
operations/mediawiki-configmaster+1 -0
mediawiki/coremaster+37 -24
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+4 -0
operations/mediawiki-configmaster+5 -1
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+1 -2
operations/mediawiki-configmaster+1 -3
operations/mediawiki-configmaster+2 -0
operations/mediawiki-configmaster+3 -0
operations/mediawiki-configmaster+3 -0
operations/mediawiki-configmaster+8 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The group0 run went much faster, only 55 minutes, as there were many fewer rows: 624213 revision rows, 471326 archive rows, 1675 ipblocks rows, 5712 image rows, 1144 oldimage rows, 1748 filearchive rows, 504043 logging rows, and 91 protected_titles rows.

Best guess at the moment is that enwiki will take around 3 weeks to complete.

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:36:48Z] <anomie@mwmaint1002> Running migrateComments.php on section 2 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:36:51Z] <anomie@mwmaint1002> Running migrateComments.php on section 1 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:36:54Z] <anomie@mwmaint1002> Running migrateComments.php on section 5 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:27Z] <anomie@mwmaint1002> Running migrateComments.php on section 8 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:31Z] <anomie@mwmaint1002> Running migrateComments.php on section 6 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:44Z] <anomie@mwmaint1002> Running migrateComments.php on wikitech for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:57Z] <anomie@mwmaint1002> Running migrateComments.php on section 7 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:46:44Z] <anomie@mwmaint1002> Running migrateComments.php on remaining section 3 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T16:07:21Z] <anomie@mwmaint1002> Running migrateComments.php on section 4 wikis for T166733

enwiki is done with the revision table already, and is about 90% done with the archive table. It has 103098562 rows to process for the remaining tables. All the s2 wikis combined had 113906981 rows in those remaining tables, and all of s2 (including revision and archive) completed in under 6 days. So hopefully that means enwiki will be done by next week rather than the week after.

All the wikis other than enwiki have already completed.

Change 472715 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Default $wgCommentTableSchemaMigrationStage to MIGRATION_NEW

https://gerrit.wikimedia.org/r/472715

enwiki finished much faster than I thought it would.

There is now a need to check consistency configuration on all codfw hosts, which were altered to prevent lots of lagging behind, as well as I would recommend some light checking on the consistency of the affected tables.

When you are happy about the whole process please tell us if there is something we should OPTIMIZE to save space.

There is now a need to check consistency configuration on all codfw hosts, which were altered to prevent lots of lagging behind, as well as I would recommend some light checking on the consistency of the affected tables.

At the table level, the following should all return zero rows:

SELECT * FROM comment WHERE comment_id = 0; -- Sanity check
SELECT * FROM revision LEFT JOIN revision_comment_temp ON (rev_id=revcomment_rev) LEFT JOIN comment ON (revcomment_comment_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM archive          LEFT JOIN comment ON (ar_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM ipblocks         LEFT JOIN comment ON (ipb_reason_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM image            LEFT JOIN comment ON (img_description_id=comment_id)   WHERE comment_id IS NULL;
SELECT * FROM oldimage         LEFT JOIN comment ON (oi_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_deleted_reason_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM recentchanges    LEFT JOIN comment ON (rc_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM logging          LEFT JOIN comment ON (log_comment_id=comment_id)       WHERE comment_id IS NULL;
SELECT * FROM protected_titles LEFT JOIN comment ON (pt_reason_id=comment_id)         WHERE comment_id IS NULL;

I'm not sure how to best lightly check those, versus just running those queries somewhere and letting them take as long as they need to to fully scan the relevant tables. I've been meaning to ask you for suggestions on doing that.

I'll leave checking replica consistency entirely in your hands.

When you are happy about the whole process please tell us if there is something we should OPTIMIZE to save space.

I can describe what was done. You probably know much better than I do which of these would make an OPTIMIZE make sense.

  • For several tables, rows created before the MediaWiki config changes in February 2018 were UPDATEd to set a VARBINARY(255) column to the empty string and a BIGINT column to a non-zero value. Any remaining rows (from February–October 2018) with the varbinary column having a non-empty value may be set to the empty string if you want. The updates were done in PK order, although non-maintenance inserts/deletes were being done concurrently.
    • archive: ar_comment and ar_comment_id.
    • ipblocks: ipb_reason and ipb_reason_id.
    • image: img_description and img_decription_id.
    • oldimage: oi_description and oi_decription_id.
    • filearchive: fa_deleted_reason and fa_deleted_reason_id, and fa_description and fa_description_id.
    • logging: log_comment and log_comment_id.
    • protected_titles: pt_reason and pt_reason_id.
  • revision_comment_temp: Many INSERTs. They were done in PK order, but all these rows' PKs were less than the existing PKs in the table and there were non-maintenance inserts/deletes being done concurrently.
  • comment: Many INSERTs, using an autoincrementing PK.
  • Also, from T188132, image_comment_temp is empty now.

Soon I should be able to file a schema change task to drop image_comment_temp entirely. That's waiting on T189158 and gerrit:417041.

Also relatively soon we should be able to alter archive, ipblocks, image, oldimage, filearchive, recentchanges, logging, and protected_titles to drop the VARBINARY(255) columns (and remove the DEFAULT on the BIGINT columns), although I'm not sure whether MediaWiki's deprecation policy would want us to wait 6 months for the 1.34 release before merging that patch.

Also somewhat soon we'll be looking at something similar for the actor migration for archive, ipblocks, image, oldimage, filearchive, recentchanges, and logging. In that case the VARBINARY(255) columns won't have been emptied though.

I will need some more time to digest the rest of your comment, but it seems it is even more I need to answer my question. The summary is that we will probably want to optimize all tables that used to store the comments to reclaim a lot of free space.

I can quickly answer:

I've been meaning to ask you for suggestions on doing that.

Run those on the vslow replicas- they are precisely thought for that- and they have lower weights to avoid affecting other production hosts .

Replica consistency is our problem- I meant actually configuration changes should go back to normal, don't worry, we will take care of that, I mentioned as a reminder to ourselves to check all codfw hosts before closing this.

Run those on the vslow replicas- they are precisely thought for that- and they have lower weights to avoid affecting other production hosts .

That's what I thought, thanks. I started screens on mwmaint1002 to do that.

At the table level, the following should all return zero rows:

SELECT * FROM comment WHERE comment_id = 0; -- Sanity check
SELECT * FROM revision LEFT JOIN revision_comment_temp ON (rev_id=revcomment_rev) LEFT JOIN comment ON (revcomment_comment_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM archive          LEFT JOIN comment ON (ar_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM ipblocks         LEFT JOIN comment ON (ipb_reason_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM image            LEFT JOIN comment ON (img_description_id=comment_id)   WHERE comment_id IS NULL;
SELECT * FROM oldimage         LEFT JOIN comment ON (oi_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_deleted_reason_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM recentchanges    LEFT JOIN comment ON (rc_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM logging          LEFT JOIN comment ON (log_comment_id=comment_id)       WHERE comment_id IS NULL;
SELECT * FROM protected_titles LEFT JOIN comment ON (pt_reason_id=comment_id)         WHERE comment_id IS NULL;

All wikis except commonswiki passed. There were 13 rows in Commons's image table that didn't get updated, apparently due to page moves and the fact that image uses img_name as its PK rather than an immutable integer. They've been fixed, and I'm running the check again just to be sure.

In theory the same could happen for oldimage (no PK, migration uses oi_name,oi_timestamp), but apparently it didn't during this run. See also T28741: Migrate file tables to a modern layout (image/oldimage; file/file_revision; add primary keys).

All wikis except commonswiki passed. There were 13 rows in Commons's image table that didn't get updated, apparently due to page moves and the fact that image uses img_name as its PK rather than an immutable integer. They've been fixed, and I'm running the check again just to be sure.

commonswiki passed the second time.

Change 475761 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new in Beta Cluster

https://gerrit.wikimedia.org/r/475761

Change 475761 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new in Beta Cluster

https://gerrit.wikimedia.org/r/475761

Change 472715 merged by jenkins-bot:
[mediawiki/core@master] Default $wgCommentTableSchemaMigrationStage to MIGRATION_NEW

https://gerrit.wikimedia.org/r/472715

Change 476591 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new on group 0

https://gerrit.wikimedia.org/r/476591

Change 476591 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new on group 0

https://gerrit.wikimedia.org/r/476591

Mentioned in SAL (#wikimedia-operations) [2018-11-29T18:28:54Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting comment migration to write-new/read-new on group 0 (T166733) (duration: 00m 52s)

Change 479489 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new on group 1

https://gerrit.wikimedia.org/r/479489

Change 479489 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new on group 1

https://gerrit.wikimedia.org/r/479489

Mentioned in SAL (#wikimedia-operations) [2018-12-13T17:00:41Z] <anomie> Set comment migration to new on group 1 (T166733)

Change 480788 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new everywhere

https://gerrit.wikimedia.org/r/480788

Change 480788 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new everywhere

https://gerrit.wikimedia.org/r/480788

Mentioned in SAL (#wikimedia-operations) [2018-12-19T16:21:48Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting comment migration to new on group 2 (T166733) (duration: 00m 52s)

Change 482371 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Mostly drop old comment schemas

https://gerrit.wikimedia.org/r/482371

BTW, can we also update https://www.mediawiki.org/wiki/Manual:Revision_comment_temp_table ?
(It seems that the revision_comment_temp table has already been dropped on enwiki, i.e. the corresponding statement there would be outdated.)

(It seems that the revision_comment_temp table has already been dropped on enwiki, i.e. the corresponding statement there would be outdated.)

The revision_comment_temp has not been dropped on enwiki,

wikiadmin@10.64.32.115(enwiki)> DESCRIBE revision_comment_temp; 
+-----------------------+---------------------+------+-----+---------+-------+
| Field                 | Type                | Null | Key | Default | Extra |
+-----------------------+---------------------+------+-----+---------+-------+
| revcomment_rev        | int(10) unsigned    | NO   | PRI | NULL    |       |
| revcomment_comment_id | bigint(20) unsigned | NO   | PRI | NULL    |       |
+-----------------------+---------------------+------+-----+---------+-------+

Nor should it be yet. The current plan is to get this, T167246, and T184615 all to the point of being ready to make final changes to revision and then do all of those changes in one big, hopefully-not-extremely-slow update. There happens to be a more detailed description of the plan at T195515#4680722.

It does appear there's no view for it in the WMCS replicas, if that's where you looked. There they're instead showing the eventual rev_comment_id field in the revision view.

Change 482371 merged by jenkins-bot:
[mediawiki/core@master] Mostly drop old comment schemas

https://gerrit.wikimedia.org/r/482371

Anomie updated the task description. (Show Details)

Note the final Wikimedia production schema changes are tracked in T233135: Schema change for refactored actor and comment storage, and cleanup of revision_comment_temp in T215466: Remove revision_comment_temp and revision_actor_temp. Changes for any extensions should be tracked in their own tasks.