Page MenuHomePhabricator

Create a maintenance script to drop rev_text_id and ar_text_id from the database.
Open, NormalPublic

Description

Once MCR migration has progressed to no longer reading from the old database schema (see T198557), the following fields can be remobved:

  • rev_text_id
  • ar_text_id

probably also

  • rev_content_model
  • rev_content_format

and maybe even

  • page_content_model

Having update.php remove them seems risky, so we'd just leave them until someone manually runs a script. The script should be written to avoid accidental data loss. In particular, it MUST not run (and output a meaningful error message) if $wgMultiContentRevisionSchemaMigrationStage is defined and has the SCHEMA_COMPAT_READ_OLD or SCHEMA_COMPAT_WRITE_OLD flags set (even after this config variable officially no longer exists). It must also present the user with a warning that running this script before the schema migration is complete results in irrecoverable data loss, and prompt for confirmation.

Related Objects

StatusAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenNone
ResolvedAbit
OpenNone
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolvedppelberg
ResolvedKrinkle
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolveddaniel
ResolvedBPirkle

Event Timeline

daniel triaged this task as Normal priority.Jun 29 2018, 3:34 PM
daniel created this task.
Tgr added a comment.Jun 29 2018, 6:06 PM

That would mean MediaWiki could never not set SCHEMA_COMPAT_WRITE_OLD by default (or risk catastrophic data loss on upgrade), no?

Well, SCHEMA_COMPAT_WRITE_OLD is effectively the default now.

Dropping these fields basically means only SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW is supported. So it can only be done after migration is complete.

I was planning to create a few more tickets to track the steps towards that point. Will do that later, running out of battery...

Tgr added a comment.Jun 29 2018, 8:20 PM

It's just a bit scary when that's triggered by update.php - if something goes wrong with the migration script you could lose all content.
When HitCounter was removed from core the update just deleted all stored data, which understandably made people very unhappy. We should not do something like that again, even by accident.

daniel updated the task description. (Show Details)
daniel removed subscribers: aude, Addshore, gerritbot.
WDoranWMF moved this task from MCR to mop on the Core Platform Team board.Jul 26 2019, 6:39 PM
daniel updated the task description. (Show Details)Aug 27 2019, 12:48 PM

Hey there, should this be moved to 1.35? The cut is a couple of weeks away? If it needs to go out in 1.34, is there anything I can do to help get it out of the door?

daniel added a comment.Sep 6 2019, 8:30 AM

Hey there, should this be moved to 1.35? The cut is a couple of weeks away? If it needs to go out in 1.34, is there anything I can do to help get it out of the door?

Yes, we should push this back. Also, I'll change the description based on @Tgr's comment. He's right, this is potentially catastrophic, and it's better to just leave the fields in place doing nothing, until they are removed manually.

daniel renamed this task from Drop rev_text_id and ar_text_id when running update.php to Create a maintenance script to drop rev_text_id and ar_text_id from the database..Sep 6 2019, 8:35 AM
daniel edited projects, added MW-1.35-release; removed MW-1.34-release.
daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)
Anomie added a comment.Sep 6 2019, 2:08 PM

@daniel renamed this task from Drop rev_text_id and ar_text_id when running update.php to Create a maintenance script to drop rev_text_id and ar_text_id from the database..

In T198492#5470617, @daniel also wrote:

Having update.php remove them seems risky, so we'd just leave them until someone manually runs a script. The script should be written to avoid accidental data loss. In particular, it MUST not run (and output a meaningful error message) if $wgMultiContentRevisionSchemaMigrationStage is defined and has the SCHEMA_COMPAT_READ_OLD or SCHEMA_COMPAT_WRITE_OLD flags set (even after this config variable officially no longer exists).

update.php is the standard mechanism for schema changes, and I don't think we ever have any other maintenance scripts perform actual schema changes. I don't think setting a precedent for doing schema changes in arbitrary maintenance scripts is at all a good idea.

A solution I've used in the past is a custom method in DatabaseUpdater that checks the relevant global and ensures that the relevant migration maintenance script runs to completion successfully (with the equivalent of --force) before applying the change.

It must also present the user with a warning that running this script before the schema migration is complete results in irrecoverable data loss, and prompt for confirmation.

Our maintenance scripts never require additional input, as this prevents (or at least complicates) automation. Again, this seems like a precedent we shouldn't set.

daniel added a comment.Sep 6 2019, 2:13 PM

update.php is the standard mechanism for schema changes, and I don't think we ever have any other maintenance scripts perform actual schema changes. I don't think setting a precedent for doing schema changes in arbitrary maintenance scripts is at all a good idea.

I think what we have done in the past was simply nothing. We would just have left these fields in, forever. Or told people to run an SQL patch by hand.

I don't see the problem with letting other scripts do schema changes if they assert the preconditions.

A solution I've used in the past is a custom method in DatabaseUpdater that checks the relevant global and ensures that the relevant migration maintenance script runs to completion successfully (with the equivalent of --force) before applying the change.

The question is how risky a bug in that code would be. Automatically converting and then immediately burning all bridges seems rather risky.

Our maintenance scripts never require additional input, as this prevents (or at least complicates) automation. Again, this seems like a precedent we shouldn't set.

That's exactly the idea here: this change should not be automated. It should be deliberate and manual. We could have --force skip the confirmation, sure.

Anomie added a comment.Sep 6 2019, 4:06 PM

I think what we have done in the past was simply nothing. We would just have left these fields in, forever. Or told people to run an SQL patch by hand.

{{citation needed}}

Maybe that was true 15 years ago, but update.php seems well-established now.

I don't see the problem with letting other scripts do schema changes if they assert the preconditions.

Schema changes on Wikimedia sites are done by the DBAs, for one. Having maintenance scripts other than update.php making such changes means more cognitive load.

Also update.php is currently our one entry point for making schema changes. Having them spread across multiple scripts means that someone wanting to keep their wiki's schemas up to date has a harder time of it.

If we ever get around to the plans I have for Abstract Schemas, that includes having tests for "start with MW 1.X's schema, run updates, now it needs to match the current schema". Which would be broken by some schema changes not being part of the set of updates.

The question is how risky a bug in that code would be. Automatically converting and then immediately burning all bridges seems rather risky.

It's basically the same thing you proposed, except relying on humans to do the checking of things.

Maybe that was true 15 years ago, but update.php seems well-established now.

No, it was never true. update.php has been used to drop fields since MW 1.5, 14 years ago, but there was no backlog of undropped fields at that time. update.php has always been the correct script in which to drop fields and tables.

Obviously we should take care to avoid losing data. But that's the case whether the update code is in update.php or another script.

No, it was never true. update.php has been used to drop fields since MW 1.5, 14 years ago, but there was no backlog of undropped fields at that time. update.php has always been the correct script in which to drop fields and tables.

Have we ever used update.php to drop fields that would cause content to be lost if the migration script had gone wrong? The problem I see is verifying that the migration has actually been successful. The idea was to simply let people use the wiki for a while, and if everything works, they can run the script. But maybe switching to reading only the new schema in 1.34 (T231673), while only dropping the fields when updating to 1.35, is sufficient for that.

Note that when I wrote this ticket, it asked for update.php to drop the fields. I let myself be convinced by @Tgr's comment T198492#4326338 that this is unnecessarily risky, so I changed the ticket.