Deploy refactored comment storage
Open, LowPublic

Description

The high-level checklist:

  • 0. Set the configuration flag to "old" (see T166732) (not needed, default is now "old")
  • 1. Merge the first patch for T166732: Refactor comment storage in the database and abstract access in MediaWiki (adding new schemas and code)
  • 2. Perform schema change (T174569)
  • 3. Turn the feature flag to "read+write both". See if stuff breaks.
  • 3.1. Announce the pending change to wikitech-l@ and cloud@, and give time for people to update.
  • 3.2. Make sure all deployed extensions are updated.
  • 4. Turn the feature flag to "write new only, read both". See if stuff breaks.
  • 5. Run the maintenance script(s) to migrate all the old stuff to new stuff, blanking the old stuff in the process.
  • 6. Turn the feature flag to "new only".
    • Default and CI
    • Beta Cluster
    • Group 0
    • Group 1
    • Group 2
  • 7. Merge the second patch for T166732 (removing old schemas and code)
    • 7.1. Submit schema change (task TBD)

We can easily do steps 2–6 for individual wikis rather than rolling it out all at once. I'm filing this task now mainly so that possibility can be discussed: do we want to do that, and if so decide which wikis to do preliminary deploys to.

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Anomie updated the task description. (Show Details)Sep 17 2018, 3:04 PM

Change 460916 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set actor and comment schema migrations for Beta Cluster

https://gerrit.wikimedia.org/r/460916

Change 460916 merged by jenkins-bot:
[operations/mediawiki-config@master] Set actor and comment schema migrations for Beta Cluster

https://gerrit.wikimedia.org/r/460916

Change 460924 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_BOTH on test wikis, mw.org

https://gerrit.wikimedia.org/r/460924

Change 460924 merged by jenkins-bot:
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on test wikis, mw.org

https://gerrit.wikimedia.org/r/460924

Mentioned in SAL (#wikimedia-operations) [2018-09-17T16:36:18Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting wgCommentTableSchemaMigrationStage = WRITE_NEW on test wikis, mw.org (T166733) (duration: 00m 50s)

Change 467683 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on group 0

https://gerrit.wikimedia.org/r/467683

Change 467683 merged by jenkins-bot:
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on group 0

https://gerrit.wikimedia.org/r/467683

Mentioned in SAL (#wikimedia-operations) [2018-10-16T13:43:22Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on group 0 (T166733) (duration: 00m 50s)

D3r1ck01 added a subscriber: D3r1ck01.

Change 469204 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on group 1

https://gerrit.wikimedia.org/r/469204

Change 469204 merged by jenkins-bot:
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on group 1

https://gerrit.wikimedia.org/r/469204

Mentioned in SAL (#wikimedia-operations) [2018-10-23T14:34:13Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on group 1 (T166733) (duration: 00m 46s)

Change 469617 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on all wikis

https://gerrit.wikimedia.org/r/469617

Change 469617 merged by jenkins-bot:
[operations/mediawiki-config@master] Set CommentTableSchemaMigrationStage => WRITE_NEW on all wikis

https://gerrit.wikimedia.org/r/469617

Mentioned in SAL (#wikimedia-operations) [2018-10-25T13:48:56Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on all wikis (T166733) (duration: 00m 55s)

Anomie updated the task description. (Show Details)
Lofhi added a subscriber: Lofhi.Oct 25 2018, 8:22 PM

I understand backfilling comments is still on the to-dos. However, it seems rev_comment is no longer being written. Do we have any wild guesses as to when backfilling will happen? If it's long enough, I'll want to rework my tools to check both tables. If we're only talking about a week or two then people can live without seeing or having stats on newly added comments.

Anomie added a subscriber: Bstorm.Oct 25 2018, 9:58 PM

We could start backfilling as soon as Monday, although I have no estimate as to how long it'll take for the scripts to actually run over all the wikis.

I note that T189158: Change `image` view to properly expose the new `img_description_id` field not having "round 1" done yet would mean that tools would be unable to fetch descriptions for the image table, as the backfilling will blank img_description and write to img_description_id rather than the image_comment_temp table. I don't know if there's a timeframe for that task to be done; @Bstorm might know. OTOH, tools are already being unable to fetch the descriptions for uploads since earlier this week since those too are populating img_description_id rather than the image_comment_temp table (I didn't think of that until I already deployed the config changes here).

Mentioned in SAL (#wikimedia-operations) [2018-10-30T15:17:24Z] <anomie> Running migrateComments.php on test wikis and mediawikiwiki for T166733

That initial run of migrateComments.php (with --batch-size 2000) was recorded as taking about 11 hours and processed 2648580 revision rows, 500791 archive rows, 13591293 logging rows, 22539 ipblocks rows, 8824 image rows, 2226 oldimage rows, 30624 filearchive rows, and 564 protected_titles rows (and 0 recentchanges rows).

I don't have separate timings for mediawikiwiki versus the test wikis, although since I started the run at 15:17 and by 16:14 it was already finished with all the test wikis and was well into processing mediawikiwiki, it's pretty safe to say mediawikiwiki alone took at least 10 of those hours. For mediawikiwiki, it updated 1966143 revision rows, 251658 archive rows, 21962 ipblocks rows, 3777 image rows, 796 oldimage rows, 12334 filearchive rows, 13271041 logging rows, an 471 protected_titles rows.

It turns out mediawikiwiki has a strangely large number of logging rows compared to a wiki like enwiki. Estimating based on EXPLAIN output, enwiki has roughly 273 times more revision table rows but only 6 times as many logging table rows. Considering all the tables involved, enwiki has about 50 times more rows.

On the other hand, anecdotally batches of revision table rows were processed faster than other tables (thanks to revision_comment_temp each batch has one bulk insert and one bulk update, versus 2000 single-row updates for each batch of the other tables), so the actual amount of work needing to be done might be less than the raw row counts indicate.

Mentioned in SAL (#wikimedia-operations) [2018-10-31T14:24:06Z] <anomie@mwmaint1002> Running migrateComments.php on group0 for T166733

The group0 run went much faster, only 55 minutes, as there were many fewer rows: 624213 revision rows, 471326 archive rows, 1675 ipblocks rows, 5712 image rows, 1144 oldimage rows, 1748 filearchive rows, 504043 logging rows, and 91 protected_titles rows.

Best guess at the moment is that enwiki will take around 3 weeks to complete.

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:36:48Z] <anomie@mwmaint1002> Running migrateComments.php on section 2 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:36:51Z] <anomie@mwmaint1002> Running migrateComments.php on section 1 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:36:54Z] <anomie@mwmaint1002> Running migrateComments.php on section 5 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:27Z] <anomie@mwmaint1002> Running migrateComments.php on section 8 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:31Z] <anomie@mwmaint1002> Running migrateComments.php on section 6 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:44Z] <anomie@mwmaint1002> Running migrateComments.php on wikitech for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:37:57Z] <anomie@mwmaint1002> Running migrateComments.php on section 7 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T13:46:44Z] <anomie@mwmaint1002> Running migrateComments.php on remaining section 3 wikis for T166733

Mentioned in SAL (#wikimedia-operations) [2018-11-01T16:07:21Z] <anomie@mwmaint1002> Running migrateComments.php on section 4 wikis for T166733

Krenair added a subscriber: Krenair.Nov 4 2018, 4:46 PM
Anomie added a comment.EditedNov 7 2018, 8:09 PM

enwiki is done with the revision table already, and is about 90% done with the archive table. It has 103098562 rows to process for the remaining tables. All the s2 wikis combined had 113906981 rows in those remaining tables, and all of s2 (including revision and archive) completed in under 6 days. So hopefully that means enwiki will be done by next week rather than the week after.

All the wikis other than enwiki have already completed.

Change 472715 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Default $wgCommentTableSchemaMigrationStage to MIGRATION_NEW

https://gerrit.wikimedia.org/r/472715

Anomie updated the task description. (Show Details)Nov 9 2018, 10:44 PM

enwiki finished much faster than I thought it would.

There is now a need to check consistency configuration on all codfw hosts, which were altered to prevent lots of lagging behind, as well as I would recommend some light checking on the consistency of the affected tables.

When you are happy about the whole process please tell us if there is something we should OPTIMIZE to save space.

Anomie added a comment.EditedNov 12 2018, 4:31 PM

There is now a need to check consistency configuration on all codfw hosts, which were altered to prevent lots of lagging behind, as well as I would recommend some light checking on the consistency of the affected tables.

At the table level, the following should all return zero rows:

SELECT * FROM comment WHERE comment_id = 0; -- Sanity check
SELECT * FROM revision LEFT JOIN revision_comment_temp ON (rev_id=revcomment_rev) LEFT JOIN comment ON (revcomment_comment_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM archive          LEFT JOIN comment ON (ar_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM ipblocks         LEFT JOIN comment ON (ipb_reason_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM image            LEFT JOIN comment ON (img_description_id=comment_id)   WHERE comment_id IS NULL;
SELECT * FROM oldimage         LEFT JOIN comment ON (oi_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_deleted_reason_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM recentchanges    LEFT JOIN comment ON (rc_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM logging          LEFT JOIN comment ON (log_comment_id=comment_id)       WHERE comment_id IS NULL;
SELECT * FROM protected_titles LEFT JOIN comment ON (pt_reason_id=comment_id)         WHERE comment_id IS NULL;

I'm not sure how to best lightly check those, versus just running those queries somewhere and letting them take as long as they need to to fully scan the relevant tables. I've been meaning to ask you for suggestions on doing that.

I'll leave checking replica consistency entirely in your hands.

When you are happy about the whole process please tell us if there is something we should OPTIMIZE to save space.

I can describe what was done. You probably know much better than I do which of these would make an OPTIMIZE make sense.

  • For several tables, rows created before the MediaWiki config changes in February 2018 were UPDATEd to set a VARBINARY(255) column to the empty string and a BIGINT column to a non-zero value. Any remaining rows (from February–October 2018) with the varbinary column having a non-empty value may be set to the empty string if you want. The updates were done in PK order, although non-maintenance inserts/deletes were being done concurrently.
    • archive: ar_comment and ar_comment_id.
    • ipblocks: ipb_reason and ipb_reason_id.
    • image: img_description and img_decription_id.
    • oldimage: oi_description and oi_decription_id.
    • filearchive: fa_deleted_reason and fa_deleted_reason_id, and fa_description and fa_description_id.
    • logging: log_comment and log_comment_id.
    • protected_titles: pt_reason and pt_reason_id.
  • revision_comment_temp: Many INSERTs. They were done in PK order, but all these rows' PKs were less than the existing PKs in the table and there were non-maintenance inserts/deletes being done concurrently.
  • comment: Many INSERTs, using an autoincrementing PK.
  • Also, from T188132, image_comment_temp is empty now.

Soon I should be able to file a schema change task to drop image_comment_temp entirely. That's waiting on T189158 and gerrit:417041.

Also relatively soon we should be able to alter archive, ipblocks, image, oldimage, filearchive, recentchanges, logging, and protected_titles to drop the VARBINARY(255) columns (and remove the DEFAULT on the BIGINT columns), although I'm not sure whether MediaWiki's deprecation policy would want us to wait 6 months for the 1.34 release before merging that patch.

Also somewhat soon we'll be looking at something similar for the actor migration for archive, ipblocks, image, oldimage, filearchive, recentchanges, and logging. In that case the VARBINARY(255) columns won't have been emptied though.

jcrespo added a comment.EditedNov 12 2018, 4:38 PM

I will need some more time to digest the rest of your comment, but it seems it is even more I need to answer my question. The summary is that we will probably want to optimize all tables that used to store the comments to reclaim a lot of free space.

I can quickly answer:

I've been meaning to ask you for suggestions on doing that.

Run those on the vslow replicas- they are precisely thought for that- and they have lower weights to avoid affecting other production hosts .

Replica consistency is our problem- I meant actually configuration changes should go back to normal, don't worry, we will take care of that, I mentioned as a reminder to ourselves to check all codfw hosts before closing this.

Anomie added a comment.EditedNov 12 2018, 4:49 PM

Run those on the vslow replicas- they are precisely thought for that- and they have lower weights to avoid affecting other production hosts .

That's what I thought, thanks. I started screens on mwmaint1002 to do that.

At the table level, the following should all return zero rows:

SELECT * FROM comment WHERE comment_id = 0; -- Sanity check
SELECT * FROM revision LEFT JOIN revision_comment_temp ON (rev_id=revcomment_rev) LEFT JOIN comment ON (revcomment_comment_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM archive          LEFT JOIN comment ON (ar_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM ipblocks         LEFT JOIN comment ON (ipb_reason_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM image            LEFT JOIN comment ON (img_description_id=comment_id)   WHERE comment_id IS NULL;
SELECT * FROM oldimage         LEFT JOIN comment ON (oi_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_deleted_reason_id=comment_id) WHERE comment_id IS NULL;
SELECT * FROM filearchive      LEFT JOIN comment ON (fa_description_id=comment_id)    WHERE comment_id IS NULL;
SELECT * FROM recentchanges    LEFT JOIN comment ON (rc_comment_id=comment_id)        WHERE comment_id IS NULL;
SELECT * FROM logging          LEFT JOIN comment ON (log_comment_id=comment_id)       WHERE comment_id IS NULL;
SELECT * FROM protected_titles LEFT JOIN comment ON (pt_reason_id=comment_id)         WHERE comment_id IS NULL;

All wikis except commonswiki passed. There were 13 rows in Commons's image table that didn't get updated, apparently due to page moves and the fact that image uses img_name as its PK rather than an immutable integer. They've been fixed, and I'm running the check again just to be sure.

In theory the same could happen for oldimage (no PK, migration uses oi_name,oi_timestamp), but apparently it didn't during this run. See also T28741: Migrate file tables to a modern layout (image/oldimage; file/file_revision).

All wikis except commonswiki passed. There were 13 rows in Commons's image table that didn't get updated, apparently due to page moves and the fact that image uses img_name as its PK rather than an immutable integer. They've been fixed, and I'm running the check again just to be sure.

commonswiki passed the second time.

Change 475761 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new in Beta Cluster

https://gerrit.wikimedia.org/r/475761

Change 475761 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new in Beta Cluster

https://gerrit.wikimedia.org/r/475761

Change 472715 merged by jenkins-bot:
[mediawiki/core@master] Default $wgCommentTableSchemaMigrationStage to MIGRATION_NEW

https://gerrit.wikimedia.org/r/472715

Change 476591 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new on group 0

https://gerrit.wikimedia.org/r/476591

Change 476591 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new on group 0

https://gerrit.wikimedia.org/r/476591

Mentioned in SAL (#wikimedia-operations) [2018-11-29T18:28:54Z] <anomie@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Setting comment migration to write-new/read-new on group 0 (T166733) (duration: 00m 52s)

Change 479489 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/mediawiki-config@master] Set comment migration stage to new on group 1

https://gerrit.wikimedia.org/r/479489

Change 479489 merged by jenkins-bot:
[operations/mediawiki-config@master] Set comment migration stage to new on group 1

https://gerrit.wikimedia.org/r/479489

Mentioned in SAL (#wikimedia-operations) [2018-12-13T17:00:41Z] <anomie> Set comment migration to new on group 1 (T166733)

Anomie updated the task description. (Show Details)