Maniphest T220791

afl_filter should be split in afl_filter_id and afl_global
Closed, ResolvedPublic
Actions

Description

afl_filter is currently a varchar(64) binary NOT NULL field. This is because, since global filters have been introduced, that field could hold either a plain numeric ID or the string "global-ID", where ID is again a numeric identifier. The alter was done in patch-global_filters.sql introduced in rEABF14b850f891de27ea09a1439e3835f66c49ad773f, and it was a poor choice.
It denormalized more the structure of the DB, and the code was never properly changed to handle afl_filter as a string. In fact, until yesterday there were 4 JOINs on af_id=afl_filter, which is a join between an integer and a string. That syntax isn't even supported in postgres, and at any rate, it slows down the query a bit because every row has to be implicitly cast before the comparison.
So basically we've lived with a stopgap in place for 10 years, and it's time to get rid of it. The proposal (already implemented) is to split the afl_filter in

afl_filter_id BIGINT unsigned NOT NULL,
afl_global tinyint(1) NOT NULL

and clean up the code accordingly. This also has the benefit of encouraging the integer type for filter IDs all around the code, since we won't need string IDs anymore.

Plan

Create the new schema
Merge r459818
Migrate WMF wikis (T269712)
Set default to SCHEMA_COMPAT_WRITE_BOTH | SCHEMA_COMPAT_READ_OLD in 1.36 (no need for COMPAT_OLD because the schema was introduced in 1.34)
Set default to SCHEMA_COMPAT_NEW in 1.37 (so people can run the script in the meanwhile)
Remove old stuff in 1.38 : r488482
Delete the old column on WMF wikis (T291719)

Details

Subject	Repo	Branch	Lines +/-
Remove afl_filter entirely	mediawiki/extensions/AbuseFilter	master	+160 -256
Write afl_filter_id and afl_global by default	mediawiki/extensions/AbuseFilter	master	+1 -1
Split afl_filter in afl_filter_id and afl_global	mediawiki/extensions/AbuseFilter	master	+442 -45
Add new schemas for splitting afl_filter	mediawiki/extensions/AbuseFilter	master	+96 -11

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T254646 Reconsidering how we name things
Open	None	T281536 Schema:EditAttemptStep uses non-inclusive language.
Open	None	T279275 Move all the functionality of {Spam,Title}Blacklist extensions into AbuseFilter and retire them
Resolved	None	T290934 Expand the set of bundled extensions and skins in MediaWiki 1.38
Resolved	Daimona	T191740 Bundle AbuseFilter extension with MediaWiki
Resolved	Daimona	T199544 Make AbuseFilter work on PostgreSQL and SQLite (epic)
Resolved	Ladsgroup	T191231 RFC: Abstract schemas and schema changes
Open	None	T259374 Convert extensions to using abstract schema
Resolved	None	T261911 Convert Bundled Extensions to Abstract Schema
Resolved	None	T261912 Convert WMF Deployed Extensions to Abstract Schema
Resolved	Umherirrender	T259377 Migrate AbuseFilter to Abstract Schema
Open	None	T93564 Addition of last hit date to Special:AbuseFilter table
Open	None	T291132 Introduce a value object for the abuse_filter_log table
Resolved	Daimona	T220791 afl_filter should be split in afl_filter_id and afl_global
Resolved	• Marostegui	T234052 Add abuse_filter_log.afl_filter_id and afl_global columns
Resolved	• Bstorm	T234615 Wikireplicas changes for abuse_filter_log including two new columns
Resolved	Daimona	T269712 Migrate afl_filter to afl_filter_id and afl_global
Resolved	• Urbanecm	T269713 Run the MigrateAflFilter script for AbuseFilter
Resolved	• Marostegui	T291719 Remove abuse_filter_log.afl_filter column and adjust schema consequently from Wikimedia production
Resolved	rook	T291806 Remove afl_filter column from the views

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 12 2019, 7:45 AM

Change 459818 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Split afl_filter in afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/459818

gerritbot added a project: Patch-For-Review.Apr 12 2019, 7:50 AM

Change 488482 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Remove afl_filter entirely

https://gerrit.wikimedia.org/r/488482

Daimona mentioned this in T221357: Read timeout reached while viewing AbuseLog.Apr 18 2019, 5:26 PM

Krinkle added projects: Schema-change, DBA.Jul 24 2019, 7:10 PM

I am fine with this choice. From what I can see the new schemas (and patches) are not yet merged.
Once they are both merged please create a schema change ticket with this template: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change and we'll take care of it
For now, I will remove the DBA tag but remain subscribed here in case I am needed

Thanks!

DannyS712 subscribed.Sep 5 2019, 8:02 PM

Daimona added a parent task: T199544: Make AbuseFilter work on PostgreSQL and SQLite (epic).Sep 13 2019, 5:38 PM

Daimona mentioned this in T191740: Bundle AbuseFilter extension with MediaWiki.Sep 13 2019, 6:04 PM

Adding CPT for clinic duty review.

So, @matej_suchanek noted that there's a big trouble in the patch. We have a variable with the migration stage for the current DB, but sometimes we'll also have to insert data in a foreign DB (e.g. for edits on some wikis, we could have to update meta). The foreign DB may have a different migration stage. How could this be handled in a non-WMF-specific way?

@Anomie Have you faced something similar for the actor migration?

Anomie moved this task from Inbox to Triage Meeting Inbox on the Platform Engineering board.Sep 16 2019, 1:27 PM

Anomie edited projects, added Platform Team Workboards (Clinic Duty Team); removed Platform Engineering.Sep 16 2019, 5:16 PM

Anomie moved this task from Inbox to External Code Review Needed on the Platform Team Workboards (Clinic Duty Team) board.

In T220791#5494282, @Daimona wrote:

So, @matej_suchanek noted that there's a big trouble in the patch. We have a variable with the migration stage for the current DB, but sometimes we'll also have to insert data in a foreign DB (e.g. for edits on some wikis, we could have to update meta). The foreign DB may have a different migration stage. How could this be handled in a non-WMF-specific way?

@Anomie Have you faced something similar for the actor migration?

Yes, I did. One of these days I even intend to write up all the details as a blog post or something, but for now I'll give the quick overview.

The trick was to make sure that at every step things were compatible between S-1 and S, both for S-1 reading data written by S and S reading data written by S-1. And then make sure to roll out the schema changes so that every wiki was moved from S-1 to S before any wiki was moved to S+1. This also has the property that it's always safe to roll any wiki back from S to S-1 in case of critical bugs, as long as no wiki was ever at S+1 yet.

There are a few more stages in the whole migration than are reflected by the feature flag, as we also have to take into account the states before the new columns are added and after the old are dropped, and the change in state caused by the running of the maintenance script to back-populate the new fields in old rows.

Also there are two ways the migration could progress, representing a tradeoff between space and read-complexity.

Of note is that this assumes that it's basically transparent to clients whether the DB is using the old or new fields. In other words, any features possible with only the old fields need to be removed early on, and any features depending on the new fields need to wait until the new fields are fully populated.

Method 1

This method uses more space for longer during the migration, but read queries are always straightforward.

Stage	Old fields	New fields	Migration flag	Notes
0	Fully populated	Missing	Assumed SCHEMA_MIGRATION_OLD	State of things before we begin the migration.
1	Fully populated	Empty	SCHEMA_MIGRATION_OLD
2	Fully populated	Partially populated	READ_OLD, WRITE_BOTH
3	Fully populated	Fully populated	READ_OLD, WRITE_BOTH	Migration script was run.
4	Fully populated	Fully populated	READ_NEW, WRITE_BOTH
5	Partially populated	Fully populated	SCHEMA_MIGRATION_NEW
6	Missing	Fully populated	Assumed SCHEMA_MIGRATION_NEW	Final state

The analysis:

0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.
1 and 2: Both ways the old fields get written, and only the old fields are read. So everything works.
2 and 3: Both ways both the old and new fields get written, so no new rows are being created with missing 'new' data. This lets the maintenance script for back-populating the new fields work sanely, no races.
3 and 4: The new and old fields are always populated, so it doesn't matter which is read.
4 and 5: Nothing is reading the old fields, so it doesn't matter if they're no longer populated.
5 and 6: Nothing will try to access the old fields at all, so they can be dropped without issue.

Method 2

This method avoids duplicating all the data, but it makes reads complicated. This works well enough for things like comment that are always just selected, but does not work well for things like actor where you want indexes to be used for WHERE or ORDER BY. If unsure, use Method 1.

Stage	Old fields	New fields	Migration flag	Notes
0	Fully populated	Missing	Assumed SCHEMA_MIGRATION_OLD	State of things before we begin the migration.
1	Fully populated	Empty	SCHEMA_MIGRATION_OLD
2	Fully populated	Partially populated	READ_BOTH, WRITE_BOTH
3	Partially populated	Partially populated	READ_BOTH, WRITE_NEW
4	Partially populated	Fully populated	READ_BOTH, WRITE_NEW	Migration script was run.
5	Partially populated	Fully populated	SCHEMA_MIGRATION_NEW
6	Missing	Fully populated	Assumed SCHEMA_MIGRATION_NEW	Final state

Note that step 2 could have WRITE_OLD instead of WRITE_BOTH, but WRITE_BOTH allows you to verify that data is being written correctly to the new fields before stopping writes to the old. If you find a bug, all you have to do is blank the bad 'new' fields and it'll fall back to the 'old' data without loss.

The analysis:

0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.
1 and 2: The old fields are always populated, so it doesn't matter if both are read or only the old.
2 and 3: Both ways read both the old and new fields, so we can stop populating the old ones without breaking anything.
3 and 4: Nothing is creating rows with unpopulated new fields, so the maintenance script can run sanely, no races.
- Also nothing is creating new rows with 'old' data present, so the script can blank the old fields as it moves the data to the new without races.
4 and 5: Now that the new fields are always fully populated, we can safely stop reading the old ones.
5 and 6: Nothing will try to access the old fields at all, so they can be dropped without issue.

In T220791#5496752, @Anomie wrote:

Yes, I did. One of these days I even intend to write up all the details as a blog post or something, but for now I'll give the quick overview.

That would be interesting, as this quick overview is :-)

The trick was to make sure that at every step things were compatible between S-1 and S, both for S-1 reading data written by S and S reading data written by S-1. And then make sure to roll out the schema changes so that every wiki was moved from S-1 to S before any wiki was moved to S+1. This also has the property that it's always safe to roll any wiki back from S to S-1 in case of critical bugs, as long as no wiki was ever at S+1 yet.

Got it, that seems a very good idea.

There are a few more stages in the whole migration than are reflected by the feature flag [...]

Yes, I accounted for some of them in the patch above; hopefully all of them.

Method 1
[...]

0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.

OK, so basically the idea is to create the schema *before* even adding the rest of the migration code, right? IIUC, we could for instance add the schema in 1.34, then the flag and the rest of the migration stuff in 1.35. This way, ideally all wikis should have the new columns by 1.35, unconditionally. Is that right?

Method 2

This method avoids duplicating all the data, but it makes reads complicated. This works well enough for things like comment that are always just selected, but does not work well for things like actor where you want indexes to be used for WHERE or ORDER BY. If unsure, use Method 1.

Yeah, I think we can go with method 1. I believe that my approach can handle all phases from 1-2 to 5-6, it just needs some more attention to 0-1. I'll upload a separate patch with the new schemas.

Change 537361 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Add new schemas for splitting afl_filter

https://gerrit.wikimedia.org/r/537361

In T220791#5498596, @Daimona wrote:

In T220791#5496752, @Anomie wrote:

0 and 1: Nothing will try to access the new fields, so it doesn't matter whether they exist yet.

OK, so basically the idea is to create the schema *before* even adding the rest of the migration code, right? IIUC, we could for instance add the schema in 1.34, then the flag and the rest of the migration stuff in 1.35. This way, ideally all wikis should have the new columns by 1.35, unconditionally. Is that right?

No, you can add them all at the same time. The idea for "0 and 1" is that you need to make sure the new schema is deployed on the DBs of every wiki in your wikifarm before you move on to stage 2.

This has very little if anything to do with 1.34 and 1.35. We could in theory go straight from stage 0 in 1.34 to stage 6 in 1.35, assuming we get through all the stages on Wikimedia wikis in that time. Although that might not be so friendly to other large farms that want to do a similar live-staged rollout instead of taking their wikis down for however long it takes to run update.php.

In T220791#5499659, @Anomie wrote:

No, you can add them all at the same time. The idea for "0 and 1" is that you need to make sure the new schema is deployed on the DBs of every wiki in your wikifarm before you move on to stage 2.

That's why I chose to split it. Note, my concerns and question were all from the POV of non-WMF wikis, where people may do something wrong with the migration.

This has very little if anything to do with 1.34 and 1.35.

I hope that putting the schema and the code in different releases will help ensure that, by the time we add the flag, both the local and the foreign DB will have the new schema. I see it's not necessary, but I think it could help?

Also, the problem with the foreign DB is only for writes, as we never read it in any part of the code. What I ended up doing (aside from putting the new schema in its own patch) is assume WRITE_BOTH for the foreign DB (here). That should work regardless of the foreign READ stage, as long as the schema has the new columns.

In T220791#5499906, @Daimona wrote:

I hope that putting the schema and the code in different releases will help ensure that, by the time we add the flag, both the local and the foreign DB will have the new schema. I see it's not necessary, but I think it could help?

Either a wiki has all their wikis on the same version and uses update.php to update, or they have to manage updates manually (like WMF does). For the former case update.php needs to handle things correctly, but we don't have to worry about a case where someone didn't run update.php.

Also, the problem with the foreign DB is only for writes, as we never read it in any part of the code.

Even if you never read from a foreign DB connection, you still have to consider reads because requests to that "foreign" wiki will still read what was written into its database. Same would apply if you were only reading from the foreign DB, you would still have to consider writes.

If it's a case where the foreign DB is only ever metawiki, there are situations where you might be able to consider one-way compatibility by requiring that metawiki be updated first/last. But personally I wouldn't bother without a compelling reason, since the process I described works bidirectionally.

What I ended up doing (aside from putting the new schema in its own patch) is assume WRITE_BOTH for the foreign DB (here). That should work regardless of the foreign READ stage, as long as the schema has the new columns.

That defeats the point of having migration stages controlled by a feature flag. It's reminiscent of how we used to do such migrations, by merging successive code changes instead of changing the configuration,

If you were to go ahead with that, you'd need to do one patch for moving 0→1, then another for 1→2, then another for 4→5, then another for 5→6. Using the migration stage configuration as designed skips the need for code changes at 1→2 and 4→5.

In T220791#5500036, @Anomie wrote:

but we don't have to worry about a case where someone didn't run update.php.

Yeah, well, I sort of tried to. Of course we cannot guarantee that everything will work, but we can try - as long as it doesn't cost too much.

Also, the problem with the foreign DB is only for writes, as we never read it in any part of the code.

Even if you never read from a foreign DB connection, you still have to consider reads because requests to that "foreign" wiki will still read what was written into its database. Same would apply if you were only reading from the foreign DB, you would still have to consider writes.

Yes, of course it matters for that wiki. But if we write both fields, shouldn't it be always fine for that wiki? The other value would just be ignored.

If it's a case where the foreign DB is only ever metawiki, there are situations where you might be able to consider one-way compatibility by requiring that metawiki be updated first/last. But personally I wouldn't bother without a compelling reason, since the process I described works bidirectionally.

For WMF, it's meta-wiki. Other wiki farms may have their own foreign DB. It wasn't my intention to require updating the central (foreign) wiki before or after the local one.

What I ended up doing (aside from putting the new schema in its own patch) is assume WRITE_BOTH for the foreign DB (here). That should work regardless of the foreign READ stage, as long as the schema has the new columns.

That defeats the point of having migration stages controlled by a feature flag. It's reminiscent of how we used to do such migrations, by merging successive code changes instead of changing the configuration,

Heh, in part, probably. But then, is it fine to assume that the local and the foreign DBs will be at compatible stages? Or, in other words, can we safely use the local stage for foreign writes? IIUC your last comment, yes. And it'd be up to the wikifarm maintainers to ensure that the stages are compatible. In which case, I should restore the local flag check as in the previous version, heh?

If you were to go ahead with that, you'd need to do one patch for moving 0→1, then another for 1→2, then another for 4→5, then another for 5→6. Using the migration stage configuration as designed skips the need for code changes at 1→2 and 4→5.

AFAICS, it's one patch for introducing the flag (0→1), then another patch for when the old schema isn't used anymore, where we would remove the line writing to it (5→6). But I do realize that it's still subpar to skip checking the stage.

In T220791#5500096, @Daimona wrote:

Yes, of course it matters for that wiki. But if we write both fields, shouldn't it be always fine for that wiki? The other value would just be ignored.

Writing both fields breaks if the "foreign" wiki is in stage 0 or 6.

Heh, in part, probably. But then, is it fine to assume that the local and the foreign DBs will be at compatible stages? Or, in other words, can we safely use the local stage for foreign writes? IIUC your last comment, yes.

Yes. That's the whole point of the stages being defined as they are.

If you were to go ahead with that, you'd need to do one patch for moving 0→1, then another for 1→2, then another for 4→5, then another for 5→6. Using the migration stage configuration as designed skips the need for code changes at 1→2 and 4→5.

AFAICS, it's one patch for introducing the flag (0→1), then another patch for when the old schema isn't used anymore, where we would remove the line writing to it (5→6). But I do realize that it's still subpar to skip checking the stage.

0→1 introduces the new schema, but can't introduce the WRITE_BOTH assumption because it'll break trying to write a wiki that's still on stage 0. So that has to wait for 1→2.

Similarly, you can't remove the WRITE_BOTH assumption in 5→6 because a stage-5 wiki writing a stage-6 wiki would be trying to write fields that no longer exist. So the assumption has to be removed earlier, at 4→5.

In T220791#5500108, @Anomie wrote:

In T220791#5500096, @Daimona wrote:

Yes, of course it matters for that wiki. But if we write both fields, shouldn't it be always fine for that wiki? The other value would just be ignored.

Writing both fields breaks if the "foreign" wiki is in stage 0 or 6.

Ah, OK, right. That was actually taken into account, although my "solution" was to make it impossible via splitting the migration across multiple releases.

Heh, in part, probably. But then, is it fine to assume that the local and the foreign DBs will be at compatible stages? Or, in other words, can we safely use the local stage for foreign writes? IIUC your last comment, yes.

Yes. That's the whole point of the stages being defined as they are.

Nice, thanks.

0→1 introduces the new schema, but can't introduce the WRITE_BOTH assumption because it'll break trying to write a wiki that's still on stage 0. So that has to wait for 1→2.

Similarly, you can't remove the WRITE_BOTH assumption in 5→6 because a stage-5 wiki writing a stage-6 wiki would be trying to write fields that no longer exist. So the assumption has to be removed earlier, at 4→5.

Got it, thanks. For the explanation, and for your patience!

Anomie moved this task from External Code Review Needed to External Code Review In Progress on the Platform Team Workboards (Clinic Duty Team) board.Sep 19 2019, 5:24 PM

Anomie moved this task from External Code Review In Progress to External Code Review Completed on the Platform Team Workboards (Clinic Duty Team) board.Sep 23 2019, 4:09 PM

Daimona mentioned this in T234052: Add abuse_filter_log.afl_filter_id and afl_global columns.Sep 27 2019, 3:35 PM

Change 537361 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Add new schemas for splitting afl_filter

https://gerrit.wikimedia.org/r/537361

Daimona added a subtask: T234052: Add abuse_filter_log.afl_filter_id and afl_global columns.Sep 27 2019, 3:42 PM

ReleaseTaggerBot added a project: MW-1.34-notes (1.34.0-wmf.25; 2019-10-01).Sep 27 2019, 4:00 PM

• Marostegui changed the status of subtask T234052: Add abuse_filter_log.afl_filter_id and afl_global columns from Open to Stalled.Sep 30 2019, 7:15 AM

Daimona changed the status of subtask T234052: Add abuse_filter_log.afl_filter_id and afl_global columns from Stalled to Open.Oct 4 2019, 12:31 PM

• Marostegui closed subtask T234052: Add abuse_filter_log.afl_filter_id and afl_global columns as Resolved.Jan 13 2020, 9:39 AM

matej_suchanek mentioned this in T174000: Log entries for global filter hits are broken if wiki goes beyond their range.Jan 19 2020, 10:52 AM

DannyS712 mentioned this in T85323: Allow filtering by local and global hits on Special:AbuseLog.Apr 27 2020, 3:24 AM

Daimona mentioned this in T259377: Migrate AbuseFilter to Abstract Schema.Aug 19 2020, 12:02 PM

Daimona added a project: User-Daimona.Aug 20 2020, 11:59 AM

Daimona moved this task from Backlog to Ideas for overhaul on the User-Daimona board.

Daimona edited projects, added AbuseFilter (Overhaul-2020); removed User-Daimona, AbuseFilter.Sep 16 2020, 3:03 PM

Daimona moved this task from Backlog to In progress on the AbuseFilter (Overhaul-2020) board.

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:01 PM

Daimona mentioned this in T93564: Addition of last hit date to Special:AbuseFilter table.Oct 19 2020, 4:00 PM

Daimona added a parent task: T93564: Addition of last hit date to Special:AbuseFilter table.Oct 19 2020, 4:09 PM

Change 459818 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Split afl_filter in afl_filter_id and afl_global

https://gerrit.wikimedia.org/r/459818

ReleaseTaggerBot added a project: MW-1.36-notes (1.36.0-wmf.22; 2020-12-15).Dec 8 2020, 9:00 PM

Calling this done now, opening another task for the migration on WMF wikis.

Duh, actually, let's keep this open for reference.

Daimona mentioned this in T269712: Migrate afl_filter to afl_filter_id and afl_global.Dec 8 2020, 9:51 PM

Daimona updated the task description. (Show Details)

Daimona added a subtask: T269712: Migrate afl_filter to afl_filter_id and afl_global.Dec 8 2020, 9:53 PM

Change 647086 had a related patch set uploaded (by Daimona Eaytoy; owner: Daimona Eaytoy):
[mediawiki/extensions/AbuseFilter@master] Write afl_filter_id and afl_global by default

https://gerrit.wikimedia.org/r/647086

Jdforrester-WMF changed the status of subtask T269712: Migrate afl_filter to afl_filter_id and afl_global from Open to Stalled.Dec 9 2020, 1:20 AM

Change 647086 merged by jenkins-bot:
[mediawiki/extensions/AbuseFilter@master] Write afl_filter_id and afl_global by default

https://gerrit.wikimedia.org/r/647086

Daimona mentioned this in T199544: Make AbuseFilter work on PostgreSQL and SQLite (epic).Jan 1 2021, 7:32 PM

Daimona updated the task description. (Show Details)Jan 5 2021, 2:18 PM

• Urbanecm changed the status of subtask T269712: Migrate afl_filter to afl_filter_id and afl_global from Stalled to Open.Jan 18 2021, 2:13 AM

Daimona added a parent task: T259377: Migrate AbuseFilter to Abstract Schema.Feb 5 2021, 3:32 PM

Daimona mentioned this in T246381: Expand the set of bundled extensions and skins in MediaWiki 1.36.Apr 12 2021, 10:02 AM

Jdforrester-WMF updated the task description. (Show Details)Apr 12 2021, 11:10 PM

Daimona updated the task description. (Show Details)Apr 13 2021, 10:03 AM