Page MenuHomePhabricator

Introduce SCHEMA_COMPAT_XXX constants to allow more fine grained control over the migration process
Closed, ResolvedPublic

Description

The MIGRATION_XXX constants do not allow for a migration process that has stages that write to the old and the new schema, but only reads from one of the schemas. For the MCR migration, it's more useful to first have a stage when both schemas are written but the old schema is still read, and then switching to a stage where both schemas are still written, but only the new schema is read: this allows us to exerciser the new code fore reading, while still avoiding extra fallback code (which is especially complex and error prone when we have to do it in a single SQL query), and still allowing us to easily back out of the migration and go back to using the old schema.

To achieve this, the migration stage should be defined as a bit field, with individual bits that control reading and writing of the old and the new schema:

define( 'SCHEMA_COMPAT_WRITE_OLD', 0x01 );
define( 'SCHEMA_COMPAT_READ_OLD', 0x02 );
define( 'SCHEMA_COMPAT_WRITE_NEW', 0x10 );
define( 'SCHEMA_COMPAT_READ_NEW', 0x20 );
define( 'SCHEMA_COMPAT_WRITE_BOTH', SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_WRITE_NEW );
define( 'SCHEMA_COMPAT_READ_BOTH', SCHEMA_COMPAT_READ_OLD | SCHEMA_COMPAT_READ_NEW );
define( 'SCHEMA_COMPAT_OLD', SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD );
define( 'SCHEMA_COMPAT_NEW', SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW );

To keep the MIGRATION_XXX constants compatible with the new bit field logic, their numeric value is changed to contain the bit field in the lower bits, and use higher bits to preserve their original ordering, like so:

define( 'MIGRATION_OLD', 0x00000000 | SCHEMA_COMPAT_OLD );
define( 'MIGRATION_WRITE_BOTH', 0x10000000 | SCHEMA_COMPAT_READ_BOTH | SCHEMA_COMPAT_WRITE_BOTH );
define( 'MIGRATION_WRITE_NEW', 0x20000000 | SCHEMA_COMPAT_READ_BOTH | SCHEMA_COMPAT_WRITE_NEW );
define( 'MIGRATION_NEW', 0x30000000 | SCHEMA_COMPAT_NEW );

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

daniel created this task.Jun 18 2018, 4:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 18 2018, 4:47 PM
Anomie added a comment.EditedJun 20 2018, 4:10 PM

This is simple enough in many cases, but for storage layer code that exposes a getQueryInfo mothod, like RevisionStore::getQueryInfo, this is quite difficult, since it requires conditional joins.

I note that the constants were created in the context of the comment and actor table migrations, which have "getQueryInfo"-like methods. In fact, some of the "getQueryInfo" methods replacing old "selectFields" methods were introduced specifically because of the comment and actor table migrations.

In such cases, the code path with fallback executed MIGRATION_WRITE_BOTH and MIGRATION_WRITE_NEW is quite different from the post-migration code path later used by MIGRATION_NEW, resulting in the post-migration code not being tested in MIGRATION_WRITE_BOTH and MIGRATION_WRITE_NEW modes.

In many cases, I think the code would look like

if ( new fields are present ) {
    use new fields
} else {
    use old fields
}

if ( new fields should be populated ) {
    populate new fields
}
if ( old fields should be populated ) {
    populate old fields
}

In these cases, the transition from using both to using only the new fields is just a matter of removing the old-field code paths.

True, the joins themselves will be different, which seems to be the main concern here.

MIGRATION_WRITE_BOTH: Write both the old and new schema. May try to read the new schema, but must fall back to the old. This is used while the change is being tested, allowing easy roll-back to the old schema. Maintenance script to migrate existing entries in the old schema to the new schema may already be run.

It depends on the maintenance script. If the script tracks work by looking for the absence of new-schema data then it's safe to run at this stage. If it tracks progress by looking for the presence of old-schema data (and removes that data as part of the migration), then it's not safe to run at this stage. Both methods have advantages and disadvantages.

I also note that your redefining of the semantics doesn't really fix the ordering problem, it's still present in there. What you probably want to do is define a parallel set of constants:

  • MIGRATION_OLD: Same name, same semantics.
  • MIGRATION_READ_OLD: Write both schemas. Read only the old schema.
  • MIGRATION_READ_NEW: Write both schemas. Read only the new schema.
  • MIGRATION_NEW: Same name, same semantics.

Variables using MIGRATION_* constants would need to define whether they're used with the existing write-oriented constants or the new read-oriented constants. When using these constants, migration scripts would be required to use the "detect absence of new data (and don't modify the old)" method.

NOTE: all code that uses greater/smaller comparison on these constants should be changed to check for equality. Greater/smaller comparison can no longer be used safely.

That would make them much less convenient to use.

daniel added a comment.EditedJun 20 2018, 4:50 PM

True, the joins themselves will be different, which seems to be the main concern here.

Indeed.

NOTE: all code that uses greater/smaller comparison on these constants should be changed to check for equality. Greater/smaller comparison can no longer be used safely.

That would make them much less convenient to use.

How about changing the numeric values and turn this into a bit field? (the values of these constants are not stored or transmitted anywhere, are they?)

  • 0x01: SCHEMA_COMPAT_WRITE_OLD
  • 0x02: SCHEMA_COMPAT_READ_OLD
  • 0x04: SCHEMA_COMPAT_WRITE_NEW
  • 0x08: SCHEMA_COMPAT_READ_NEW

This would mean the current constants would be re-defined to have the following values:

  • MIGRATION_OLD: 0x03 = SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD
  • MIGRATION_WRITE_BOTH: 0x0F = SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD | SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW
  • MIGRATION_WRITE_NEW: 0x0E = SCHEMA_COMPAT_READ_OLD | SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW
  • MIGRATION_NEW: 0x0C = SCHEMA_COMPAT_WRITE_NEW | SCHEMA_COMPAT_READ_NEW

And I would now add:

  • MIGRATION_READ_NEW: 0x0D = SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD | SCHEMA_COMPAT_WRITE_NEW

The naming wouldn't be perfect, and checks that are currently using smaller/greater would need to be changed to bit field logic. But I think this should work nicely...

A bitfield could work.

I'd like to preserve the numerical ordering of the stages too, though, which we can do easily enough by reserving some of the high-order bits for that: instead of 0x03, 0x0F, 0x0E, and 0x0C use something like 0x00000003, 0x1000000F, 0x2000000E, and 0x3000000C.

And I would now add:

  • MIGRATION_READ_NEW: 0x0D = SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD | SCHEMA_COMPAT_WRITE_NEW

You have a typo in there.

  • MIGRATION_READ_OLD: 0x07 = SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_OLD | SCHEMA_COMPAT_WRITE_NEW
  • MIGRATION_READ_NEW: 0x0D = SCHEMA_COMPAT_WRITE_OLD | SCHEMA_COMPAT_READ_NEW | SCHEMA_COMPAT_WRITE_NEW

Change 442153 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Introduce new schema flags and use them in RevisionStore.

https://gerrit.wikimedia.org/r/442153

daniel renamed this task from Introduce MIGRATION_READ_NEW to Introduce SCHEMA_COMPAT_XXX constants to allow more fine grained control over the migration process.Jun 27 2018, 11:16 AM
daniel updated the task description. (Show Details)
daniel updated the task description. (Show Details)

Updated the task description following @Anomie's suggestions.

daniel claimed this task.Jun 28 2018, 8:01 PM
Vvjjkkii renamed this task from Introduce SCHEMA_COMPAT_XXX constants to allow more fine grained control over the migration process to wpaaaaaaaa.Jul 1 2018, 1:03 AM
Vvjjkkii removed daniel as the assignee of this task.
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from wpaaaaaaaa to Introduce SCHEMA_COMPAT_XXX constants to allow more fine grained control over the migration process.Jul 2 2018, 4:30 AM
CommunityTechBot assigned this task to daniel.
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.

Change 442153 merged by jenkins-bot:
[mediawiki/core@master] Introduce new schema flags and use them in RevisionStore.

https://gerrit.wikimedia.org/r/442153

daniel closed this task as Resolved.Jul 8 2018, 4:31 PM