Page MenuHomePhabricator

Create technical plan for merging overlapping blocks
Closed, ResolvedPublic3 Story Points

Description

  • Host meetings/discussions as needed
  • Do technical investigations as needed

Acceptance criteria

  • Produce a detailed, estimate-able implementation plan for how to solve overlapping blocks.

Event Timeline

TBolliger created this task.Dec 5 2018, 8:17 PM
TBolliger triaged this task as Normal priority.

We'll hold a full-AHT-team meeting to come up with a plan

dbarratt closed this task as Resolved.Dec 18 2018, 9:00 PM
dbarratt claimed this task.

I've updated the description in T206163 let me know if it needs anything more.

Thank you David! ๐Ÿš€

TBolliger reopened this task as Open.Jan 14 2019, 7:31 PM
TBolliger updated the task description. (Show Details)
TBolliger set the point value for this task to 3.
TBolliger removed dbarratt as the assignee of this task.

Here's a summary of what we've discussed so far.

Overview

By "overlapping blocks", we mean when multiple, independent blocks apply to a user. For example, a user's account and IP address may be independently blocked, or multiple IP range blocks may apply to their IP address, etc.

Blocks don't always have the same effect: they can vary in what they prohibit. Before we introduced partial blocks, all blocks would restrict editing sitewide, but some might (or might not) also restrict the target from creating new accounts, editing their own talk page, etc. Now we have partial blocks, they can vary even more: some blocks may only prohibit editing certain pages or namespaces, or may not even prohibit editing.

This raises the question: what should we do when multiple blocks apply? Should we choose one of these blocks and enforce it - and if so, which one? Or should we somehow combine the features of all the blocks, for example by creating a new block that enforces the union of the restrictions, or (at the other extreme) enforces the intersect of the restrictions?

Whatever the answer to this should be, we currently look for blocks in a particular order and enforce the first block we find (with a couple of exceptions below), without necessarily knowing whether other blocks apply, or what actions any other blocks restrict. For example, if a partial block and a sitewide block apply to a user, if the partial block is found first then it will be enforced, and the sitewide block ignored. Whether or not this is desirable behaviour, it's not intentional behaviour, and only arises due to a quirk of a system that was designed before partial blocks existed. We could improve on this by looking at all the blocks that apply to the user's account and any associated IP addresses etc first, and then create a new block object combining their features.

What currently happens

In short, we pick one block and enforce that.

In long, we choose a block using a complicated flowchart that sometimes prioritises blocks by the block target, and sometimes prioritises blocks by how restrictive they are. It starts with User::getBlockedStatus, and generally goes as follows (with some differences depending on global flags, whitelists, whether the user is logged in, IP block exemptions, etc):

  1. Is there a block to the user account? If so, use this.
  2. Is there a block to the user's exact IP address? If so use this.
  3. Is there at least one IP range block covering the user's IP address? If so, use the one with the narrowest range (see Block::newLoad).
  4. Is there a block ID specified in the user's cookie? If so, use this.
  5. Is the user's IP address in a list of locally blocked proxies? If so, create a sitewide block object with default restriction flags and use this.
  6. Is the user's IP address in a DNS blacklist? If so, create a sitewide block object with default restriction flags and use this.
  7. Are there any blocks to any IPs in the XFF header? If so, compare the blocks and use the most restrictive (while prioritising exact IP blocks over IP range blocks - see Block::chooseBlock).
  8. Is the user's IP address within any IP ranges specified in a list of soft blocks? If so, create a sitewide block object with default restriction flags and use this.
  9. Run hooks.

Note that, once the answer to any of these questions is yes, we don't look for further blocks. It's also worth noting that we potentially query the database multiple times, in steps 1-3, 4 and 7.

Solution

We could instead do something along the lines of:

  1. Make one query to the blocks table for any blocks that apply to the user's account, IP addresses, cookie block if present - i.e. all the blocks that could apply to this user
  2. Check if the soft block list or any of the blacklists cover any of the IPs
  3. If any blocks are found, create a new block object combining the features of these
  4. Run hooks

We should bear in mind that the ability to have multiple blocks makes it difficult for an admin creating a block to predict whether their block will be applied in different circumstances - this is an existing problem, but note that it's not solved here.

Thank you for the detailed research and comments, @Tchanders.

@dbarratt, @dmaza @aezell @Mooeypoo โ€” We have an estimation meeting scheduled for this Thursday, February 7. Will we be ready to break this into estimateable implementation tasks?

aezell added a comment.Feb 4 2019, 6:54 PM

If we all agree with @Tchanders solution, we can create a few tasks to cover that work. I like the simplicity, at least in reasoning, of this approach. I imagine the code might get a bit squirrelly. However, being able to easily describe how this works means we have a clearer path to build it which is valuable.

If it helps, we also discussed some implementation details...

The existing workflow is complex, and carefully controlled by various flags, whitelists, etc, so we need to be careful not to disrupt that. For example, we don't want to suddenly start checking for IP blocks when the user is exempt from them, or proxy blocks if they are using a whitelisted proxy.

We can gather any user account, block ID, or IP addresses of interest in User::getBlockedStatus, respecting the existing rules. We could then query for all relevant blocks at once - we could rename and modify Block::getBlocksForIPList for this purpose, since it's not used elsewhere (https://codesearch.wmflabs.org/search/?q=getBlocksForIPList&i=nope&files=&repos=).

Rather than passing these blocks to Block::chooseBlock, we could implement something like Block::mergeBlocks, to create a new block object to enforce the union of the restrictions of the blocks:

  • For editing:
    • If any block is sitewide, the new block applies a sitewide editing block
    • If all blocks are partial, the new block applies the restrictions (if any) to all the namespaces and pages specified in these blocks
  • For account creation, emailing, editing one's own user talk page:
    • If any block prevents the user from performing the action, the new block restricts the user from performing the action

For the most part, we should be able to preserve the reasons/messages, but there may be multiple to display.

If we do it this way, then every action that the admins who made these blocks intended to block will be blocked. On the other hand, some of the actions that the admins intended to allow may be blocked, due to a different block by another admin.

Yep. This all sounds good to me.

@TBolliger We talked about some different possibilities for the merged block reason. For instance we could list all of the block reasons (which could be confusing, especially on mobile), or we could generate a completely new reason that lists out the blocks... something like "Multiple Blocks. #23, #4534, and #2345" and the numbers could link to the block list or the block log where they can get more details about the block.

For block reason, initially we should show something then address the proper block reasons in T212326

TBolliger closed this task as Resolved.Feb 5 2019, 7:13 PM
TBolliger moved this task from Review to Done on the Anti-Harassment (Bet โ€” ื‘) board.

We can gather any user account, block ID, or IP addresses of interest in User::getBlockedStatus, respecting the existing rules. We could then query for all relevant blocks at once - we could rename and modify Block::getBlocksForIPList for this purpose, since it's not used elsewhere (https://codesearch.wmflabs.org/search/?q=getBlocksForIPList&i=nope&files=&repos=).

As discussed today, we should make a new method for getting all the blocks, and begin a deprecation process for Block::getBlocksForIPList.