Page MenuHomePhabricator

Create a maintenance script to drop rev_text_id and ar_text_id from the database.
Closed, InvalidPublic

Description

Once MCR migration has progressed to no longer reading from the old database schema (see T198557), the following fields can be removed:

  • rev_text_id
  • ar_text_id

probably also

  • rev_content_model
  • rev_content_format

and maybe even

  • page_content_model

Having update.php remove them seems risky, so we'd just leave them until someone manually runs a script. The script should be written to avoid accidental data loss. In particular, it MUST not run (and output a meaningful error message) if $wgMultiContentRevisionSchemaMigrationStage is defined and has the SCHEMA_COMPAT_READ_OLD or SCHEMA_COMPAT_WRITE_OLD flags set (even after this config variable officially no longer exists). It must also present the user with a warning that running this script before the schema migration is complete results in irrecoverable data loss, and prompt for confirmation.

Related Objects

StatusSubtypeAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenBUG REPORTNone
OpenNone
StalledNone
OpenFeatureNone
DuplicateNone
ResolvedNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
ResolvedNone
ResolvedNone
OpenFeatureNone
OpenNone
OpenFeatureNone
StalledNone
OpenNone
InvalidNone
Resolveddaniel
Resolveddaniel
ResolvedBPirkle
ResolvedCCicalese_WMF

Event Timeline

That would mean MediaWiki could never not set SCHEMA_COMPAT_WRITE_OLD by default (or risk catastrophic data loss on upgrade), no?

Well, SCHEMA_COMPAT_WRITE_OLD is effectively the default now.

Dropping these fields basically means only SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW is supported. So it can only be done after migration is complete.

I was planning to create a few more tickets to track the steps towards that point. Will do that later, running out of battery...

It's just a bit scary when that's triggered by update.php - if something goes wrong with the migration script you could lose all content.
When HitCounter was removed from core the update just deleted all stored data, which understandably made people very unhappy. We should not do something like that again, even by accident.

daniel updated the task description. (Show Details)
daniel removed subscribers: aude, Addshore, gerritbot.

Hey there, should this be moved to 1.35? The cut is a couple of weeks away? If it needs to go out in 1.34, is there anything I can do to help get it out of the door?

Hey there, should this be moved to 1.35? The cut is a couple of weeks away? If it needs to go out in 1.34, is there anything I can do to help get it out of the door?

Yes, we should push this back. Also, I'll change the description based on @Tgr's comment. He's right, this is potentially catastrophic, and it's better to just leave the fields in place doing nothing, until they are removed manually.

daniel renamed this task from Drop rev_text_id and ar_text_id when running update.php to Create a maintenance script to drop rev_text_id and ar_text_id from the database..Sep 6 2019, 8:35 AM
daniel edited projects, added MW-1.35-release; removed MW-1.34-release.
daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)

@daniel renamed this task from Drop rev_text_id and ar_text_id when running update.php to Create a maintenance script to drop rev_text_id and ar_text_id from the database..

In T198492#5470617, @daniel also wrote:

Having update.php remove them seems risky, so we'd just leave them until someone manually runs a script. The script should be written to avoid accidental data loss. In particular, it MUST not run (and output a meaningful error message) if $wgMultiContentRevisionSchemaMigrationStage is defined and has the SCHEMA_COMPAT_READ_OLD or SCHEMA_COMPAT_WRITE_OLD flags set (even after this config variable officially no longer exists).

update.php is the standard mechanism for schema changes, and I don't think we ever have any other maintenance scripts perform actual schema changes. I don't think setting a precedent for doing schema changes in arbitrary maintenance scripts is at all a good idea.

A solution I've used in the past is a custom method in DatabaseUpdater that checks the relevant global and ensures that the relevant migration maintenance script runs to completion successfully (with the equivalent of --force) before applying the change.

It must also present the user with a warning that running this script before the schema migration is complete results in irrecoverable data loss, and prompt for confirmation.

Our maintenance scripts never require additional input, as this prevents (or at least complicates) automation. Again, this seems like a precedent we shouldn't set.

update.php is the standard mechanism for schema changes, and I don't think we ever have any other maintenance scripts perform actual schema changes. I don't think setting a precedent for doing schema changes in arbitrary maintenance scripts is at all a good idea.

I think what we have done in the past was simply nothing. We would just have left these fields in, forever. Or told people to run an SQL patch by hand.

I don't see the problem with letting other scripts do schema changes if they assert the preconditions.

A solution I've used in the past is a custom method in DatabaseUpdater that checks the relevant global and ensures that the relevant migration maintenance script runs to completion successfully (with the equivalent of --force) before applying the change.

The question is how risky a bug in that code would be. Automatically converting and then immediately burning all bridges seems rather risky.

Our maintenance scripts never require additional input, as this prevents (or at least complicates) automation. Again, this seems like a precedent we shouldn't set.

That's exactly the idea here: this change should not be automated. It should be deliberate and manual. We could have --force skip the confirmation, sure.

I think what we have done in the past was simply nothing. We would just have left these fields in, forever. Or told people to run an SQL patch by hand.

{{citation needed}}

Maybe that was true 15 years ago, but update.php seems well-established now.

I don't see the problem with letting other scripts do schema changes if they assert the preconditions.

Schema changes on Wikimedia sites are done by the DBAs, for one. Having maintenance scripts other than update.php making such changes means more cognitive load.

Also update.php is currently our one entry point for making schema changes. Having them spread across multiple scripts means that someone wanting to keep their wiki's schemas up to date has a harder time of it.

If we ever get around to the plans I have for Abstract Schemas, that includes having tests for "start with MW 1.X's schema, run updates, now it needs to match the current schema". Which would be broken by some schema changes not being part of the set of updates.

The question is how risky a bug in that code would be. Automatically converting and then immediately burning all bridges seems rather risky.

It's basically the same thing you proposed, except relying on humans to do the checking of things.

Maybe that was true 15 years ago, but update.php seems well-established now.

No, it was never true. update.php has been used to drop fields since MW 1.5, 14 years ago, but there was no backlog of undropped fields at that time. update.php has always been the correct script in which to drop fields and tables.

Obviously we should take care to avoid losing data. But that's the case whether the update code is in update.php or another script.

No, it was never true. update.php has been used to drop fields since MW 1.5, 14 years ago, but there was no backlog of undropped fields at that time. update.php has always been the correct script in which to drop fields and tables.

Have we ever used update.php to drop fields that would cause content to be lost if the migration script had gone wrong? The problem I see is verifying that the migration has actually been successful. The idea was to simply let people use the wiki for a while, and if everything works, they can run the script. But maybe switching to reading only the new schema in 1.34 (T231673), while only dropping the fields when updating to 1.35, is sufficient for that.

Note that when I wrote this ticket, it asked for update.php to drop the fields. I let myself be convinced by @Tgr's comment T198492#4326338 that this is unnecessarily risky, so I changed the ticket.

update.php should run the following patches:

  • patch-revision-actor-comment-MCR.sql - includes dropping rev_text_id, dropping rev_content_model, and dropping rev_content_format
  • patch-archive-MCR.sql - includes dropping ar_text_id

Per T251343: Drop unused columns from revision table in DatabaseUpdater (CommentStore, Actor, MCR)

page_content_model is still in use, and shouldn't be dropped yet.

Not sure a separate script is needed - T198492#4326338 suggested we shouldn't use update.php, but T251343: Drop unused columns from revision table in DatabaseUpdater (CommentStore, Actor, MCR) went ahead and added the updates to update.php already. Suggest declining or merging

Hey there, should this be moved to 1.37? The cut for 1.36 has happened, and 1.36.0-rc.0 will be cut in a fortnight or so, after which feature changes shouldn't be landed and back-ported.

The update.php code already drops these columns.