Page MenuHomePhabricator

afl_filter should be split in afl_filter_id and afl_global
Closed, ResolvedPublic

Description

afl_filter is currently a varchar(64) binary NOT NULL field. This is because, since global filters have been introduced, that field could hold either a plain numeric ID or the string "global-ID", where ID is again a numeric identifier. The alter was done in patch-global_filters.sql introduced in rEABF14b850f891de27ea09a1439e3835f66c49ad773f, and it was a poor choice.
It denormalized more the structure of the DB, and the code was never properly changed to handle afl_filter as a string. In fact, until yesterday there were 4 JOINs on af_id=afl_filter, which is a join between an integer and a string. That syntax isn't even supported in postgres, and at any rate, it slows down the query a bit because every row has to be implicitly cast before the comparison.
So basically we've lived with a stopgap in place for 10 years, and it's time to get rid of it. The proposal (already implemented) is to split the afl_filter in

afl_filter_id BIGINT unsigned NOT NULL,
afl_global tinyint(1) NOT NULL

and clean up the code accordingly. This also has the benefit of encouraging the integer type for filter IDs all around the code, since we won't need string IDs anymore.

Plan

  • Create the new schema
  • Merge r459818
  • Migrate WMF wikis (T269712)
  • Set default to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD in 1.36 (no need for COMPAT_OLD because the schema was introduced in 1.34)
  • Set default to SCHEMA_COMPAT_NEW in 1.37 (so people can run the script in the meanwhile)
  • Remove old stuff in 1.38 : r488482
  • Delete the old column on WMF wikis (T291719)

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
OpenNone
OpenNone
ResolvedNone
ResolvedDaimona
ResolvedDaimona
ResolvedLadsgroup
OpenNone
ResolvedNone
ResolvedNone
ResolvedUmherirrender
OpenNone
OpenNone
ResolvedDaimona
Resolved Marostegui
Resolved Bstorm
ResolvedDaimona
Resolved Urbanecm
Resolved Marostegui
Resolvedrook

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 459818 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Split afl_filter in afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/459818

Change 488482 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove afl_filter entirely

https://gerrit.wikimedia.org/r/488482

Marostegui subscribed.

I am fine with this choice. From what I can see the new schemas (and patches) are not yet merged.
Once they are both merged please create a schema change ticket with this template: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change and we'll take care of it
For now, I will remove the DBA tag but remain subscribed here in case I am needed

Thanks!

So, @matej_suchanek noted that there's a big trouble in the patch. We have a variable with the migration stage for the current DB, but sometimes we'll also have to insert data in a foreign DB (e.g. for edits on some wikis, we could have to update meta). The foreign DB may have a different migration stage. How could this be handled in a non-WMF-specific way?

@Anomie Have you faced something similar for the actor migration?

So, @matej_suchanek noted that there's a big trouble in the patch. We have a variable with the migration stage for the current DB, but sometimes we'll also have to insert data in a foreign DB (e.g. for edits on some wikis, we could have to update meta). The foreign DB may have a different migration stage. How could this be handled in a non-WMF-specific way?

@Anomie Have you faced something similar for the actor migration?

Yes, I did. One of these days I even intend to write up all the details as a blog post or something, but for now I'll give the quick overview.

The trick was to make sure that at every step things were compatible between S-1 and S, both for S-1 reading data written by S and S reading data written by S-1. And then make sure to roll out the schema changes so that every wiki was moved from S-1 to S before any wiki was moved to S+1. This also has the property that it's always safe to roll any wiki back from S to S-1 in case of critical bugs, as long as no wiki was ever at S+1 yet.

There are a few more stages in the whole migration than are reflected by the feature flag, as we also have to take into account the states before the new columns are added and after the old are dropped, and the change in state caused by the running of the maintenance script to back-populate the new fields in old rows.

Also there are two ways the migration could progress, representing a tradeoff between space and read-complexity.

Of note is that this assumes that it's basically transparent to clients whether the DB is using the old or new fields. In other words, any features possible with only the old fields need to be removed early on, and any features depending on the new fields need to wait until the new fields are fully populated.

Method 1

This method uses more space for longer during the migration, but read queries are always straightforward.

StageOld fieldsNew fieldsMigration flagNotes
0Fully populatedMissingAssumed SCHEMA_MIGRATION_OLDState of things before we begin the migration.
1Fully populatedEmptySCHEMA_MIGRATION_OLD
2Fully populatedPartially populatedREAD_OLD, WRITE_BOTH
3Fully populatedFully populatedREAD_OLD, WRITE_BOTHMigration script was run.
4Fully populatedFully populatedREAD_NEW, WRITE_BOTH
5Partially populatedFully populatedSCHEMA_MIGRATION_NEW
6MissingFully populatedAssumed SCHEMA_MIGRATION_NEWFinal state

The analysis:

  • 0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.
  • 1 and 2: Both ways the old fields get written, and only the old fields are read. So everything works.
  • 2 and 3: Both ways both the old and new fields get written, so no new rows are being created with missing 'new' data. This lets the maintenance script for back-populating the new fields work sanely, no races.
  • 3 and 4: The new and old fields are always populated, so it doesn't matter which is read.
  • 4 and 5: Nothing is reading the old fields, so it doesn't matter if they're no longer populated.
  • 5 and 6: Nothing will try to access the old fields at all, so they can be dropped without issue.

Method 2

This method avoids duplicating all the data, but it makes reads complicated. This works well enough for things like comment that are always just selected, but does not work well for things like actor where you want indexes to be used for WHERE or ORDER BY. If unsure, use Method 1.

StageOld fieldsNew fieldsMigration flagNotes
0Fully populatedMissingAssumed SCHEMA_MIGRATION_OLDState of things before we begin the migration.
1Fully populatedEmptySCHEMA_MIGRATION_OLD
2Fully populatedPartially populatedREAD_BOTH, WRITE_BOTH
3Partially populatedPartially populatedREAD_BOTH, WRITE_NEW
4Partially populatedFully populatedREAD_BOTH, WRITE_NEWMigration script was run.
5Partially populatedFully populatedSCHEMA_MIGRATION_NEW
6MissingFully populatedAssumed SCHEMA_MIGRATION_NEWFinal state

Note that step 2 could have WRITE_OLD instead of WRITE_BOTH, but WRITE_BOTH allows you to verify that data is being written correctly to the new fields before stopping writes to the old. If you find a bug, all you have to do is blank the bad 'new' fields and it'll fall back to the 'old' data without loss.

The analysis:

  • 0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.
  • 1 and 2: The old fields are always populated, so it doesn't matter if both are read or only the old.
  • 2 and 3: Both ways read both the old and new fields, so we can stop populating the old ones without breaking anything.
  • 3 and 4: Nothing is creating rows with unpopulated new fields, so the maintenance script can run sanely, no races.
    • Also nothing is creating new rows with 'old' data present, so the script can blank the old fields as it moves the data to the new without races.
  • 4 and 5: Now that the new fields are always fully populated, we can safely stop reading the old ones.
  • 5 and 6: Nothing will try to access the old fields at all, so they can be dropped without issue.

Yes, I did. One of these days I even intend to write up all the details as a blog post or something, but for now I'll give the quick overview.

That would be interesting, as this quick overview is :-)

The trick was to make sure that at every step things were compatible between S-1 and S, both for S-1 reading data written by S and S reading data written by S-1. And then make sure to roll out the schema changes so that every wiki was moved from S-1 to S before any wiki was moved to S+1. This also has the property that it's always safe to roll any wiki back from S to S-1 in case of critical bugs, as long as no wiki was ever at S+1 yet.

Got it, that seems a very good idea.

There are a few more stages in the whole migration than are reflected by the feature flag [...]

Yes, I accounted for some of them in the patch above; hopefully all of them.

Method 1
[...]

  • 0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.

OK, so basically the idea is to create the schema *before* even adding the rest of the migration code, right? IIUC, we could for instance add the schema in 1.34, then the flag and the rest of the migration stuff in 1.35. This way, ideally all wikis should have the new columns by 1.35, unconditionally. Is that right?

Method 2

This method avoids duplicating all the data, but it makes reads complicated. This works well enough for things like comment that are always just selected, but does not work well for things like actor where you want indexes to be used for WHERE or ORDER BY. If unsure, use Method 1.

Yeah, I think we can go with method 1. I believe that my approach can handle all phases from 1-2 to 5-6, it just needs some more attention to 0-1. I'll upload a separate patch with the new schemas.

Change 537361 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add new schemas for splitting afl_filter

https://gerrit.wikimedia.org/r/537361

  • 0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.

OK, so basically the idea is to create the schema *before* even adding the rest of the migration code, right? IIUC, we could for instance add the schema in 1.34, then the flag and the rest of the migration stuff in 1.35. This way, ideally all wikis should have the new columns by 1.35, unconditionally. Is that right?

No, you can add them all at the same time. The idea for "0 and 1" is that you need to make sure the new schema is deployed on the DBs of every wiki in your wikifarm before you move on to stage 2.

This has very little if anything to do with 1.34 and 1.35. We could in theory go straight from stage 0 in 1.34 to stage 6 in 1.35, assuming we get through all the stages on Wikimedia wikis in that time. Although that might not be so friendly to other large farms that want to do a similar live-staged rollout instead of taking their wikis down for however long it takes to run update.php.

No, you can add them all at the same time. The idea for "0 and 1" is that you need to make sure the new schema is deployed on the DBs of every wiki in your wikifarm before you move on to stage 2.

That's why I chose to split it. Note, my concerns and question were all from the POV of non-WMF wikis, where people may do something wrong with the migration.

This has very little if anything to do with 1.34 and 1.35.

I hope that putting the schema and the code in different releases will help ensure that, by the time we add the flag, both the local and the foreign DB will have the new schema. I see it's not necessary, but I think it could help?

Also, the problem with the foreign DB is only for writes, as we never read it in any part of the code. What I ended up doing (aside from putting the new schema in its own patch) is assume WRITE_BOTH for the foreign DB (here). That should work regardless of the foreign READ stage, as long as the schema has the new columns.

I hope that putting the schema and the code in different releases will help ensure that, by the time we add the flag, both the local and the foreign DB will have the new schema. I see it's not necessary, but I think it could help?

Either a wiki has all their wikis on the same version and uses update.php to update, or they have to manage updates manually (like WMF does). For the former case update.php needs to handle things correctly, but we don't have to worry about a case where someone didn't run update.php.

Also, the problem with the foreign DB is only for writes, as we never read it in any part of the code.

Even if you never read from a foreign DB connection, you still have to consider reads because requests to that "foreign" wiki will still read what was written into its database. Same would apply if you were only reading from the foreign DB, you would still have to consider writes.

If it's a case where the foreign DB is only ever metawiki, there are situations where you might be able to consider one-way compatibility by requiring that metawiki be updated first/last. But personally I wouldn't bother without a compelling reason, since the process I described works bidirectionally.

What I ended up doing (aside from putting the new schema in its own patch) is assume WRITE_BOTH for the foreign DB (here). That should work regardless of the foreign READ stage, as long as the schema has the new columns.

That defeats the point of having migration stages controlled by a feature flag. It's reminiscent of how we used to do such migrations, by merging successive code changes instead of changing the configuration,

If you were to go ahead with that, you'd need to do one patch for moving 0→1, then another for 1→2, then another for 4→5, then another for 5→6. Using the migration stage configuration as designed skips the need for code changes at 1→2 and 4→5.

but we don't have to worry about a case where someone didn't run update.php.

Yeah, well, I sort of tried to. Of course we cannot guarantee that everything will work, but we can try - as long as it doesn't cost too much.

Also, the problem with the foreign DB is only for writes, as we never read it in any part of the code.

Even if you never read from a foreign DB connection, you still have to consider reads because requests to that "foreign" wiki will still read what was written into its database. Same would apply if you were only reading from the foreign DB, you would still have to consider writes.

Yes, of course it matters for that wiki. But if we write both fields, shouldn't it be always fine for that wiki? The other value would just be ignored.

If it's a case where the foreign DB is only ever metawiki, there are situations where you might be able to consider one-way compatibility by requiring that metawiki be updated first/last. But personally I wouldn't bother without a compelling reason, since the process I described works bidirectionally.

For WMF, it's meta-wiki. Other wiki farms may have their own foreign DB. It wasn't my intention to require updating the central (foreign) wiki before or after the local one.

What I ended up doing (aside from putting the new schema in its own patch) is assume WRITE_BOTH for the foreign DB (here). That should work regardless of the foreign READ stage, as long as the schema has the new columns.

That defeats the point of having migration stages controlled by a feature flag. It's reminiscent of how we used to do such migrations, by merging successive code changes instead of changing the configuration,

Heh, in part, probably. But then, is it fine to assume that the local and the foreign DBs will be at compatible stages? Or, in other words, can we safely use the local stage for foreign writes? IIUC your last comment, yes. And it'd be up to the wikifarm maintainers to ensure that the stages are compatible. In which case, I should restore the local flag check as in the previous version, heh?

If you were to go ahead with that, you'd need to do one patch for moving 0→1, then another for 1→2, then another for 4→5, then another for 5→6. Using the migration stage configuration as designed skips the need for code changes at 1→2 and 4→5.

AFAICS, it's one patch for introducing the flag (0→1), then another patch for when the old schema isn't used anymore, where we would remove the line writing to it (5→6). But I do realize that it's still subpar to skip checking the stage.

Yes, of course it matters for that wiki. But if we write both fields, shouldn't it be always fine for that wiki? The other value would just be ignored.

Writing both fields breaks if the "foreign" wiki is in stage 0 or 6.

Heh, in part, probably. But then, is it fine to assume that the local and the foreign DBs will be at compatible stages? Or, in other words, can we safely use the local stage for foreign writes? IIUC your last comment, yes.

Yes. That's the whole point of the stages being defined as they are.

If you were to go ahead with that, you'd need to do one patch for moving 0→1, then another for 1→2, then another for 4→5, then another for 5→6. Using the migration stage configuration as designed skips the need for code changes at 1→2 and 4→5.

AFAICS, it's one patch for introducing the flag (0→1), then another patch for when the old schema isn't used anymore, where we would remove the line writing to it (5→6). But I do realize that it's still subpar to skip checking the stage.

0→1 introduces the new schema, but can't introduce the WRITE_BOTH assumption because it'll break trying to write a wiki that's still on stage 0. So that has to wait for 1→2.

Similarly, you can't remove the WRITE_BOTH assumption in 5→6 because a stage-5 wiki writing a stage-6 wiki would be trying to write fields that no longer exist. So the assumption has to be removed earlier, at 4→5.

Yes, of course it matters for that wiki. But if we write both fields, shouldn't it be always fine for that wiki? The other value would just be ignored.

Writing both fields breaks if the "foreign" wiki is in stage 0 or 6.

Ah, OK, right. That was actually taken into account, although my "solution" was to make it impossible via splitting the migration across multiple releases.

Heh, in part, probably. But then, is it fine to assume that the local and the foreign DBs will be at compatible stages? Or, in other words, can we safely use the local stage for foreign writes? IIUC your last comment, yes.

Yes. That's the whole point of the stages being defined as they are.

Nice, thanks.

0→1 introduces the new schema, but can't introduce the WRITE_BOTH assumption because it'll break trying to write a wiki that's still on stage 0. So that has to wait for 1→2.

Similarly, you can't remove the WRITE_BOTH assumption in 5→6 because a stage-5 wiki writing a stage-6 wiki would be trying to write fields that no longer exist. So the assumption has to be removed earlier, at 4→5.

Got it, thanks. For the explanation, and for your patience!

Change 537361 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add new schemas for splitting afl_filter

https://gerrit.wikimedia.org/r/537361

Change 459818 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Split afl_filter in afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/459818

Calling this done now, opening another task for the migration on WMF wikis.

Duh, actually, let's keep this open for reference.

Change 647086 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Write afl_filter_id and afl_global by default

https://gerrit.wikimedia.org/r/647086

Change 647086 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Write afl_filter_id and afl_global by default

https://gerrit.wikimedia.org/r/647086

Daimona changed the task status from Open to Stalled.Apr 13 2021, 12:27 PM

We cannot proceed until 1.38, when we'll remove the old schema.

Daimona changed the task status from Stalled to Open.Sep 14 2021, 11:06 AM

Change 488482 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Remove afl_filter entirely

https://gerrit.wikimedia.org/r/488482

The production removal task is blocked by this work, rather than the other way around; this can be declared Resolved.

The index removal patch causes Beta-Cluster-Infrastructure to fail when running update.php T291725