Page MenuHomePhabricator

ipblocks schema redesign for multiblocks
Closed, ResolvedPublic

Description

Schema

  • Move the target-related fields in ipblocks to a separate table.
  • Rename ipblocks to block.
  • bt_address will be null for user blocks.
  • bt_user_text will be null for IP blocks.
CREATE TABLE block_target (
  bt_id INT UNSIGNED AUTO_INCREMENT NOT NULL,
  bt_address TINYBLOB,
  bt_user INT UNSIGNED,
  bt_user_text VARBINARY(255),
  bt_auto TINYINT(1) DEFAULT 0 NOT NULL,
  bt_range_start TINYBLOB,
  bt_range_end TINYBLOB,
  bt_ip_hex TINYBLOB,
  bt_count int NOT NULL default 0,
  INDEX bt_address (
    bt_address(42)
  ),
  INDEX bt_user (bt_user),
  INDEX bt_range (
    bt_range_start(35),
    bt_range_end(35)
  ),
  INDEX bt_ip_user_text (
    bt_ip_hex(35),
    bt_user_text(255)
  ),
  PRIMARY KEY(bt_id)
);

CREATE TABLE block (
  bl_id INT UNSIGNED AUTO_INCREMENT NOT NULL,

  -- New foreign key
  bl_target INT UNSIGNED NOT NULL,

  bl_by_actor BIGINT UNSIGNED NOT NULL,
  bl_reason_id BIGINT UNSIGNED NOT NULL,
  bl_timestamp BINARY(14) NOT NULL,
  bl_anon_only TINYINT(1) DEFAULT 0 NOT NULL,
  bl_create_account TINYINT(1) DEFAULT 1 NOT NULL,
  bl_enable_autoblock TINYINT(1) DEFAULT 1 NOT NULL,
  bl_expiry VARBINARY(14) NOT NULL,
  bl_deleted TINYINT(1) DEFAULT 0 NOT NULL,
  bl_block_email TINYINT(1) DEFAULT 0 NOT NULL,
  bl_allow_usertalk TINYINT(1) DEFAULT 0 NOT NULL,
  bl_parent_block_id INT UNSIGNED DEFAULT NULL,
  bl_sitewide TINYINT(1) DEFAULT 1 NOT NULL,

  -- Index on new field
  INDEX bl_target (bl_target),

  INDEX bl_timestamp (bl_timestamp),
  INDEX bl_expiry (bl_expiry),
  INDEX bl_parent_block_id (bl_parent_block_id),
  PRIMARY KEY(bl_id)
);

Query review

Searched for /['"]ipblocks['"]/, DatabaseBlock::getQueryInfo

  • Writers
    • DatabaseBlockStore
    • CentralAuthUser::doLocalSuppression (cross-wiki)
    • MergeUser::mergeBlocks
    • RenameuserSQL
    • CleanupBlocks
    • CleanupUsersWithNoId (probably broken already)
  • Queries only for ipb_deleted
    • ApiQueryBlockInfoTrait branch 2
    • ActiveUsersPager
    • UsersPager
    • UserNamePrefixSearch
    • UserSelectQueryBuilder
    • LocalIdLookup
    • Flow: OneStepUserNameQuery, TwoStepUserNameQuery
    • WMCS maintain_views.yaml user, user_old, actor
  • Readers
    • ApiQueryBlocks
    • ApiQueryBlockInfoTrait branch 1
    • DatabaseBlockStore
    • BlockListPager
    • CentralAuthUser::localUserData (cross-wiki)
    • \MediaWiki\CheckUser\Investigate\Services\PreliminaryCheckService::isUserBlocked
    • UncachedMenteeOverviewDataProvider::getFilteredMenteesForMentor
    • \MediaWiki\Extension\PageTriage\ArticleCompile\ArticleCompileUserData::compile
    • PageTriage ArticleCompileUserData
    • WMCS maintain_views.yaml
  • Referring to the table but probably not affected
    • BlockRestrictionStore::updateByParentBlockId
    • findMissingActors
    • removeUnusedAccounts

Migration

CentralAuth does cross-wiki read and write queries on ipblocks. This would complicate fast single-flag migration.

Multi-stage migration would have the following configuration in MW core: T346671

  • $wgBlockTargetMigrationStage: a combination of SCHEMA_COMPAT_xxx flags.
  • $wgEnableMultiBlocks: when this is false, the number of blocks of a given target is limited to 1.

The old Special:Block UI is shown. When this is true, multiple blocks of a target are permitted.

The deployment procedure would be as follows. Each step is global, affecting all wikis.

  • Deploy the new code in read-old/write-old mode.
  • Create block and block_target tables.
  • Switch to write-both mode. Assume the migration script will copy each ipblocks row atomically, with ipb_id=bl_id. When updating a block, update it in ipblocks, and if there is a corresponding row in the block table, update it there too. When inserting a block, insert it into both tables. The number of blocks of a given target is artificially limited to 1. When deleting a block, delete the block_target row if the number of blocks on the target goes to zero.
  • Run a script to fully populate block_target/block.
  • Migrate WMCS and other miscellaneous readers.
  • Switch to read-new/write-new mode.
  • Drop the ipblocks tables.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenBUG REPORTNone
OpenNone
OpenNone
OpenNone
ResolvedAug 21 2018dbarratt
DeclinedNone
OpenNone
Resolvedtstarling
Resolvedtstarling
ResolvedHMonroy
Declineddmaza
StalledNone
Resolvedtstarling
ResolvedMusikAnimal
OpenNone
ResolvedLadsgroup
ResolvedPRODUCTION ERRORtstarling
Resolvedtstarling
Resolvedtstarling
Resolvedtstarling
ResolvedPRODUCTION ERRORtstarling
OpenMarostegui
Opentstarling
ResolvedNone
Resolvedtstarling
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

$wgEnableMultiBlocks

Note: The UserMerge extension would require different handling of blocks depending on whether we have multiblock. To reduce the long-term maintanence burden we may consider always allowing multiblock once the feature is stable.

$wgEnableMultiBlocks

Note: The UserMerge extension would require different handling of blocks depending on whether we have multiblock. To reduce the long-term maintanence burden we may consider always allowing multiblock once the feature is stable.

There will be an announcement before we flip the switch, but we'll be as thorough as possible with wmf deployed extensions and make sure everything works as expected

@tstarling Following up on our previous conversation, do we want to take this opportunity and "rename" ipblocks by rolling out a clone table (block_entry) and deprecate ipblocks once we are done?

  • MediaWiki currently allow user to redact only the performer of an edit (including log entries, etc.); a common use case is if a user accidently logged out, the user name need to be hidden. To preserve such feature we could introduce copies of existing rows on actor table, with a different deleted flag.

Thanks. I updated the proposal. Now I am proposing to keep Special:RevisionDelete as it is, but have all changes lists join on a new table indexed by actor, so that Special:Block doesn't have to update *_deleted. So revdel user hiding and block user hiding will be independent.

everything wanting a user list must join on ipblocks and filter by ipb_deleted.

In the new schema it is still the case: If we hide the actor of a revision a actor_deleted row is needed, though the user itself is not hidden, so we can not know the account hidden status without ipblocks. A new column in actor_deleted like ad_user_is_hidden may help. Note the ipblocks table in some wiki may be very large due to admin bot blocking proxies.


If you also want to migrate the DELETED_USER bit of the *_deleted bitfield, we need an ad_level column to differ sysop level deletion (revdel) and oversight-level deletion (suppression); see also T253462.


Also, after this redesign it does not mean we can safely replicate actor rows. This is (in my opinion) what the workflow will look like in new schema:

Revdel an actor:

  • Find an corresponding actor row for the user with actor_deleted record (and with proper ad_level and without ad_user_is_hidden). If none found, create a new duplicated actor row, and add a row to actor_deleted.
  • Switch the revision entry to the new actor row.
  • (REQUIRED) now the old actor row may be orphaned. If it is it should be removed, to remove it from replica.

Un-revdel actor:

  • Find or create an actor row without actor_deleted.
  • Switch the revision entry to new actor row.
  • Cleaning up actor and actor_deleted is nice, though the actor row is already not replicated.

Suppress block user:

  • (if we do not allow multiple suppress blocks to co-exist) check if the user is already suppressed and fail if it is.
  • If there are not already an actor row for this user, create one.
  • Change ad_level for all actor_deleted rows to oversight level deletion and create actor_deleted rows for actor rows of this user without it. After this there may be multiple actor rows with same settings.
  • Find an actor_deleted row (if we allow multiple suppress blocks, also without ad_user_is_hidden, set its ad_user_is_hidden to true, and refer it in ipb_deleted_id.

Unsuppress user:

  • Find the actor_deleted row via ipb_deleted_id and set its ad_user_is_hidden to false.
  • Remove the ipblocks row or clear its ipb_deleted_id value.
  • The edits and logs of users continue to be OS-level suppressed, which matchs the current behavior.

UserMerge:
If only one suppress block is allowed, other suppress blocks should have ipb_deleted_id and ad_user_is_hidden cleared.


Note (not directly related to this task):

Even if we do a similar migration for comment table, the comment table is still not safe to be replicated. revdeled edit summaries and log summaries will bot be replicated, but restricted logs (e.g. CU and OS logs) still exist. In comparison, it may be a low risk to show an actor which only appears in CU or OS log.

In the new schema it is still the case: If we hide the actor of a revision a actor_deleted row is needed, though the user itself is not hidden, so we can not know the account hidden status without ipblocks. A new column in actor_deleted like ad_user_is_hidden may help. Note the ipblocks table in some wiki may be very large due to admin bot blocking proxies.

No, that's not what I'm proposing. If you hide the actor of a single revision only, that is rev_deleted=4 even in the new schema. There is no actor_deleted row. There's no need for ad_user_is_hidden because a row is only inserted when the actor is hidden.

If you also want to migrate the DELETED_USER bit of the *_deleted bitfield, we need an ad_level column to differ sysop level deletion (revdel) and oversight-level deletion (suppression); see also T253462.

I'm not doing that.

Also, after this redesign it does not mean we can safely replicate actor rows.

Yes, I'm aware. As you pointed out, the problems were not as closely linked as they first appeared. So I'm not fixing the problem of replicating actor rows right now. I'll remove the reference to it in the task description.

I'm not totally sure if I want to touch ipb_deleted at all.

There will be a separate task, in any case.

@tstarling Following up on our previous conversation, do we want to take this opportunity and "rename" ipblocks by rolling out a clone table (block_entry) and deprecate ipblocks once we are done?

I am still considering this. I'm migrating a few things to see how hard it is and what the code will look like.

T51504 is pretty straightforward and can probably be included. There's a patch that was almost merged, it just ran out of steam during code review.

Making bt_address nullable would allow us to use COALESCE(bt_address, user_name) as a b/c hack.

I updated the proposed schema, including a rename of the ipblocks table to block.

Change 971343 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Support new block schema

https://gerrit.wikimedia.org/r/971343

Change 973442 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] New block/block_target schema

https://gerrit.wikimedia.org/r/973442

ApiQueryBlocks can search for all single-IP blocks in a given range, using e.g. bkip=1.2.3.4/22. It uses ipb_range_start to do that, which contradicts T51504. To resolve T51504 while retaining this feature as an indexed query, it is necessary to add a field to block_target which contains the hex form of single-IP blocks.

Special:BlockList apparently lacks such a feature. Searching for a range will only match blocks that block that exact range.

ApiQueryBlocks can search for all single-IP blocks in a given range, using e.g. bkip=1.2.3.4/22. It uses ipb_range_start to do that, which contradicts T51504. To resolve T51504 while retaining this feature as an indexed query, it is necessary to add a field to block_target which contains the hex form of single-IP blocks.

Never mind, there actually is no such feature. ApiQueryBlocks and Special:BlockList both work the same way, which is to show blocks that fully encompass a given range. So if you search for a range, they only show blocks that block the whole specified range.

But maybe there should be such a feature. I'll add the field anyway. It will also be useful if we want to provide a backwards compatible view for WMCS.

But maybe there should be such a feature. I'll add the field anyway. It will also be useful if we want to provide a backwards compatible view for WMCS.

I think that's T183300: Use list=blocks to return all single IP blocks in a CIDR?

Change 974718 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/CentralAuth@master] Support new block schema

https://gerrit.wikimedia.org/r/974718

Change 975104 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/Flow@master] In OneStepUserNameQuery support the new block_target schema

https://gerrit.wikimedia.org/r/975104

Change 975109 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/CheckUser@master] Get blocks from DatabaseBlockStore instead of doing our own query

https://gerrit.wikimedia.org/r/975109

Change 975111 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/GrowthExperiments@master] Support new block_target schema

https://gerrit.wikimedia.org/r/975111

Change 975115 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/extensions/PageTriage@master] Support new block_target schema

https://gerrit.wikimedia.org/r/975115

Change 971343 merged by jenkins-bot:

[mediawiki/core@master] Support new block schema

https://gerrit.wikimedia.org/r/971343

@tstarling Drawing attention to T352310. Is there a migration stage we have to do on beta?

Change 975115 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Support new block_target schema

https://gerrit.wikimedia.org/r/975115

Change 975109 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Get blocks from DatabaseBlockStore instead of doing our own query

https://gerrit.wikimedia.org/r/975109

Change 975104 merged by jenkins-bot:

[mediawiki/extensions/Flow@master] In OneStepUserNameQuery support the new block_target schema

https://gerrit.wikimedia.org/r/975104

Change 974718 merged by jenkins-bot:

[mediawiki/extensions/CentralAuth@master] Support new block schema

https://gerrit.wikimedia.org/r/974718

Change 973442 merged by jenkins-bot:

[mediawiki/core@master] Add the new block and block_target tables

https://gerrit.wikimedia.org/r/973442

Change 975111 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Support new block_target schema

https://gerrit.wikimedia.org/r/975111

Reopening for 3rd party migration and MW core cleanup.

Change #1017959 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Migrate to the new block schema on non-WMF wikis

https://gerrit.wikimedia.org/r/1017959

I know there is a backward compatibility layer added to wikireplicas but I highly recommend announcing the change and removing that layer a couple months later. The main use of views is to hide private information and combining b/c with that adds a lot of complexity and can and has caused private data to be leaked before (e.g. an extra schema change has been but didn't take the b/c view into account). Specially given that we hide private information via views in block data already.

Change #1017959 merged by jenkins-bot:

[mediawiki/core@master] block: Migrate to the new block schema on non-WMF wikis

https://gerrit.wikimedia.org/r/1017959

@tstarling MergeUserTest::testMergeEditcount is now failing in a bunch of repos and blocking merges since it is loaded as a dependency. See, e.g., https://gerrit.wikimedia.org/r/c/mediawiki/extensions/DiscussionTools/+/1029837

23:56:02 1) MergeUserTest::testBasicMerge
23:56:02 Wikimedia\Rdbms\DBQueryError: Error 1054: Unknown column 'bt_address' in 'field list'
23:56:02 Function: MergeUser::mergeBlocks
23:56:02 Query: SELECT  bl_id AS `ipb_id`,COALESCE(bt_address, bt_user_text) AS `ipb_address`,bl_timestamp AS `ipb_timestamp`,bt_auto AS `ipb_auto`,bl_anon_only AS `ipb_anon_only`,bl_create_account AS `ipb_create_account`,bl_enable_autoblock AS `ipb_enable_autoblock`,bl_expiry AS `ipb_expiry`,bl_deleted AS `ipb_deleted`,bl_block_email AS `ipb_block_email`,bl_allow_usertalk AS `ipb_allow_usertalk`,bl_parent_block_id AS `ipb_parent_block_id`,bl_sitewide AS `ipb_sitewide`,bl_by_actor AS `ipb_by_actor`,block_by_actor.actor_user AS `ipb_by`,block_by_actor.actor_name AS `ipb_by_text`,comment_bl_reason.comment_text AS `ipb_reason_text`,comment_bl_reason.comment_data AS `ipb_reason_data`,comment_bl_reason.comment_id AS `ipb_reason_cid`,ipb_user  FROM `unittest_block` JOIN `unittest_actor` `block_by_actor` ON ((actor_id=bl_by_actor)) JOIN `unittest_comment` `comment_bl_reason` ON ((comment_bl_reason.comment_id = bl_reason_id))   WHERE ipb_user IN (1,3)

I believe this was caused by the change above that updated the default
See https://gerrit.wikimedia.org/r/q/hashtag:%22failure-usermerge-t346293%22+(status:open%20OR%20status:merged) for tracking those changes (so that we know to resubmit once MergeUserTest is fixed)

Looking here: https://gerrit.wikimedia.org/g/mediawiki/core/+/bc11d8072a47fdc2f9d3a60f78c09716948b626e/includes/block/DatabaseBlockStore.php#336 tables does not list block_target, but it refers to bt_* fields from that table.

The query definition a few lines below: https://gerrit.wikimedia.org/g/mediawiki/core/+/bc11d8072a47fdc2f9d3a60f78c09716948b626e/includes/block/DatabaseBlockStore.php#367 does list block_target when using fields from that table.

I'm not sure if that's a mistake, but it looks suspicious.

Looking here: https://gerrit.wikimedia.org/g/mediawiki/core/+/bc11d8072a47fdc2f9d3a60f78c09716948b626e/includes/block/DatabaseBlockStore.php#336 tables does not list block_target, but it refers to bt_* fields from that table.

The query definition a few lines below: https://gerrit.wikimedia.org/g/mediawiki/core/+/bc11d8072a47fdc2f9d3a60f78c09716948b626e/includes/block/DatabaseBlockStore.php#367 does list block_target when using fields from that table.

I'm not sure if that's a mistake, but it looks suspicious.

Yes, that's a bug, but fixing it doesn't help. The idea was to try to implement a backwards-compatible DatabaseBlock::getQueryInfo() by providing field aliases, but because getQueryInfo() has no way to normalize the conditions, it turns out that no caller in codesearch was able to use DatabaseBlock::getQueryInfo() on the new schema without modifications, so there was no point. The only thing you could use it for would be dumping the whole table. I hard deprecated it in 1023166 but maybe I should have just removed it without deprecation.

The fix for UserMerge is at 1030077.