Page MenuHomePhabricator

Multiblocks — Allow for multiple, simultaneous blocks with different expiration dates.
Open, MediumPublic

Description

RfC: T202673: RFC: Multiblocks - let admins create multiple, overlapping blocks on a single user
After T2674: Allow users to be blocked from editing a specific article or all articles inside a namespace there may be reason for a user to have several blocks with different expiration dates. For example:

User:Apples has been indefinitely blocked from editing Neptune. They then receive a 24 hour full-site block. When the full site block expires, they should continue to be blocked from Neptune.

or

User:Bananas is indefinitely blocked from editing Mars and from editing Venus until 2025. An admin wants to block them from Saturn for one month.

This is easy to manually set for simple granular blocks but becomes more troublesome if the user is blocked from several pages, categories, namespaces, and/or actions.


Requirements

  • Adding a new block should not affect any existing block.
  • Each block can contain different parameters & independent expiration dates.
  • Admins will need to see all active blocks set against a user/IP on Special:Block.
    • Admins will be able to select an active block and modify its parameters.
  • Admins will be able to remove one, several, or all blocks from Special:Unblock.
  • Users will be able to see all their active blocks on Special:Contributions
  • Users will be able to see all blocks on Special:BlockList (searching/filtering will be handled on another ticket, to be filed.)

Designs

Version here: https://meta.wikimedia.org/wiki/Community_health_initiative/Partial_blocks/Multi-blocks


Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  1. Is there a reason why bt_auto exists in block_target instead of block_entry. It feels to me that autoblocks can be just another entry

Because, for privacy, you can't search for an autoblock entry by IP address or otherwise see the IP address the autoblock is associated with. If an admin blocks an IP address that already has an autoblock associated with it, it's a privacy violation to show the autoblock on the same list with the manual block. Currently, unblocking or modifying an autoblock can only be done by ID using a "#" prefix. So it is more like a kind of target than a block option.

But for efficiency at save time, it's necessary to be able to search the database for autoblocks by IP address. So you can't just use the autoblocked user as the target.

When a user is blocked, an autoblock is immediately inserted for their last IP address. So without privacy protections, the autoblock feature could be used to provide the IP address of any editor, by temporarily blocking them. This could have a significant impact especially for small wikis where there is no effective oversight of admins.

TBolliger renamed this task from Allow a way for a partial block to be reinstated after a sitewide block expires (multi-blocks or other system) to Multi-blocks — Allow for multiple, simultaneously blocks with different expiration dates..Aug 16 2018, 9:27 PM
TBolliger updated the task description. (Show Details)
dbarratt renamed this task from Multi-blocks — Allow for multiple, simultaneously blocks with different expiration dates. to Multiblocks — Allow for multiple, simultaneously blocks with different expiration dates..Aug 22 2018, 7:50 PM

This project has been deprioritized by the WMF's Anti-Harassment Tools team until further notice. We want to finish Partial Blocks and re-evaluate if this functionality (which is expensive to build) is actually necessary.

I would rather tackle T208175 first anyways and see if that resolves most of the original use case(s).

In fact, with T208175, it might be somewhat trivial for someone to make a bot that would rollback a block to a previous state after it's expired (or something like that).

The development of this project would be a massive time and resource investment for minimal impact. Boldly closing.

tstarling removed a project: Anti-Harassment.

Reopening and untagging Anti-Harrassment per T202673#4933221 : a task should be left open if it is a good idea but there is no resourcing.

Is there any progress on this?
I have just had an issue with a user indefinitely blocked from the Wikipedia namespace: I blocked him sitewide for an hour, I had to restore his Wikipedia namespace block right after.
This is not an expected behaviour at all. A sitewide block was due to an additional rule violation in a different namespace, and this is definitely not a good reason to remove the indefinite Wikipedia namespace block.
As is, an AbuseFilter is a better solution than a partial block: I would also need two actions (full block, edit AbuseFilter VS full block, restore namespace block) but the solution would be permanent.

Hi, is there any progress on this? This task is now open for over two years, but nothing happened so far. It can't take years to program that several blocking parameters are possible at the same time.

Hi, is there any progress on this? This task is now open for over two years, but nothing happened so far. It can't take years to program that several blocking parameters are possible at the same time.

There still hasn't been anyone willing to take this on.

Work on this probably shouldn't start until the TechCom-RFC at T202673 is complete

Community Tech are looking at this, following its high ranking in the 2023 Community Wishlist Survey.

Three people have said to me that we should consider just dropping the unique index constraint. But I'm still pretty keen on keeping the block conflict feature I introduced in 2006 (1d9922db6442dec82e4be07c45c2e8a3a2035602). In the multiblocks context, the feature means that if two admins try to block a user or IP address simultaneously, and the user has no prior blocks, one admin should succeed and the other should get a confirmation page with details of the first block. If we just remove the unique index, both admins would succeed in creating a block.

I can think of three approaches to supporting multiple blocks with block conflict prevention:

1. Gap locking with FOR UPDATE

The unique index is dropped, and we use gap locking to prevent insertion of rows with the same (or adjacent) addresses.

BEGIN;
SELECT COUNT(*) FROM ipblocks WHERE ipb_address='...' FOR UPDATE;
INSERT INTO ipblocks ...;
COMMIT;

The problem is that it will generate deadlocks, both in the conflict case and when adjacent IP addresses or users are blocked.

According to the MySQL manual, deadlocks are not meant to be a big deal. You can just reissue the transaction. But this conflicts with MediaWiki's current transaction system, in which there is a single transaction for the whole web request, with no control over it except via DeferredUpdates. This makes it difficult to write deadlock-tolerant code.

2. Denormalized summary table

The target information is duplicated into a summary table. The unique index is dropped from ipblocks, but the summary table has a unique index.

BEGIN;
INSERT INTO ipblocks_summary (ips_address, ips_count) VALUES (...);
INSERT INTO ipblocks (ipb_address, ...) VALUES (...);
COMMIT;

Secondary block insertion:

BEGIN;
UPDATE ipblocks_summary SET ips_count=2 WHERE ips_address='...', ips_count=1;
INSERT INTO ipblocks ...;
COMMIT;
3. Normalized schema

This is my idea from 2018 T194697#4490345. Although we could keep the ipblocks table and field names the same for easier migration. The plan would be to add the summary table as in option 2 (now called block_target), then add ipb_target, then drop the duplicated fields from ipblocks such as ipb_address.

BEGIN;
INSERT INTO block_target ...;
INSERT INTO ipblocks (ipb_target, ...) VALUES ($bt_id, ...);
COMMIT;

Secondary block insertion:

BEGIN;
UPDATE block_target SET bt_count=2 WHERE bt_id=$id..., bt_count=1;
INSERT INTO ipblocks (ipb_target, ...) VALUES ($id, ...);
COMMIT;

Migration options:

  1. Use SCHEMA_COMPAT_xxx modes. Complex code, slow migration, zero impact on users.
  2. Use a single feature flag. For each section in turn, switch the wikis in the section to read-only mode, update the schema, then switch the feature flag.
  3. Cowboy olden days option. Just update the code and do the schema update simultaneously with a train deployment.

Considerations:

  • Code review for migration complexity.
  • Schema change time. The size of ipblocks varies, see T267818.
Bonus level: fixing polymorphic ipblocks

@daniel noted that user blocks and IP blocks are not really the same. In terms of schema design, ipblocks is a polymorphic table, since it is trying to represent at least two types of target (IP range and user). For an IP block, ipb_user is zero. for a user block, ipb_range_start is empty and ipb_address contains the denormalized username, requiring a special case in RenameuserSQL. Arguably single IP addresses and IP ranges are distinct kinds of blocks. There is a fixme in tables.json saying that, for efficiency, ipb_range_start/ipb_range_end should be empty for single-IP blocks. In the PHP code, they have distinct type constants Block::TYPE_IP and Block::TYPE_RANGE.

The conventional representation for polymorphic data, referring to e.g. Polymorphic Association – bad SQL Smell! by Tom Gillies, is to have separate tables for separate types, and to reverse the direction of the foreign key association. In such a design, the block_target table would just have a primary key, a count, and maybe a type enum, and we would have tables such as blocked_user with bu_targetblock_target.bt_id.

It would be a far-reaching change to the conceptual model, ideally including a split of the DatabaseBlock class.

In this context, please note that we are relying on the assumption that there can only be a single active block in some places. In particular, when we want to provide information about the block of a given user, we just look at the most recent block log entry for that user. There is no good way right now to list all active blocks for a user and provide all relevant details about each block, because some of the info is in the block itself, and some is in the respective log entry. If we can have multiple active blocks for a given user, we should have a better way to link block log entries to blocks.

re polymorphic modeling: I not that we have the same problem with the ipblocks_restrictions table, where the ir_value field may reepresent a page ID or a namespace ID.

In this case, we may want to replace the int field ir_value with a blob field ir_data that contains JSON. That is still polymorphic (or even amorphous), but more explicit: you don't need the ir_value field to determin the interpretation of the vlaue, you just look at they keys.

when we want to provide information about the block of a given user, we just look at the most recent block log entry for that user.

I assume you mean IntroMessageBuilder::addUserWarnings() and a few similar calls to LogEventsList::showLogExtract(). I'll look into that.

Originally there was no log extract. I'm not sure what user needs are being met by the log extract that aren't achievable with ipblocks. Apparently I signed off on it at r62241 but I don't remember what it was for.

re polymorphic modeling: I not that we have the same problem with the ipblocks_restrictions table, where the ir_value field may reepresent a page ID or a namespace ID.

In this case, we may want to replace the int field ir_value with a blob field ir_data that contains JSON. That is still polymorphic (or even amorphous), but more explicit: you don't need the ir_value field to determin the interpretation of the vlaue, you just look at they keys.

That's out of scope. There's going to be a schema change to the ipblocks table, so it's fair enough to bundle in a few other updates to that table, but there's no need to alter ipblocks_restrictions to get multiblocks done. It's not even old code. It was discussed in 2018 at T193449, with lots of people weighing in with passionate arguments.

From DBA point of view, given that this table is generally small and doesn't' get a lot of writes, I don't have any concerns on the proposal. I personally like the third option the most but I'm happy with whatever Tim is happy with.

In this context, please note that we are relying on the assumption that there can only be a single active block in some places. In particular, when we want to provide information about the block of a given user, we just look at the most recent block log entry for that user. There is no good way right now to list all active blocks for a user and provide all relevant details about each block, because some of the info is in the block itself, and some is in the respective log entry. If we can have multiple active blocks for a given user, we should have a better way to link block log entries to blocks.

Also notice the block message displayed when users try to edit a page and the message when viewing a blocked user's contribution come from different table. The first from ipblocks and the latter from logging table. For the latter, see also T277466: Deleted block log entries causes non-relevant block entries to be shown on Special:Contributions .

Jdforrester-WMF renamed this task from Multiblocks — Allow for multiple, simultaneously blocks with different expiration dates. to Multiblocks — Allow for multiple, simultaneous blocks with different expiration dates..Oct 12 2023, 7:59 PM