Page MenuHomePhabricator

Collect information about users affected by blocks
Open, LowPublic

Description

For some time by now, I'm thinking about having a "Impact of your block" kind of dashboard. This dashboard would let its users (Wikimedia stewards and administrators; we can restrict the latter to "checkusers" if admins is too broad) to see how many users are impacted by a (range) block.

I'm not sure how about admins, but email queue of Wikimedia Stewards is often terribly backlogged (ATM, we've about 1k of mails to go through). It would be useful to know which blocks (in stewards' case, global) affect how many users. A dashboard would enable us to factor the information in our decision-making processes.

Ideally, the dashboard will be usable in two ways:

  1. before a block is placed, estimate how many users would be stopped from editing
  2. after a block is placed, see how many users pressed "Edit", but were met with an error message.

For the first use-case, that should be doable with the currently-available data, I think (the wmf_raw.mediawiki_private_cu_changes should give the information needed for that).

For the second use-case, we'd need to collect new data. Based on a short discussion in #wikimedia-analytics at IRC, that should be done by creating a new stream. I'm not sure where to feed the data at the MediaWiki end of things though. Maybe in PermissionManager::getPermissionErrors, when action is edit? Or should we do it in the editor itself to remove edits made via the API (in that case, we'd need to account for tools using API at direction of the user, like DiscussionTools or Twinkle)?

Event Timeline

Urbanecm renamed this task from Collect information about users affected by global blocks to Collect information about users affected by blocks.Dec 23 2021, 3:22 PM
Urbanecm updated the task description. (Show Details)

Differential privacy seems like a tricky issue here, unless queries are limited to large ranges.

Differential privacy seems like a tricky issue here, unless queries are limited to large ranges.

Not necessarily. We can restrict the tool to be viewable only by stewards (global information) and checkusers (wiki-specific information). Those two groups are technically able to view IP addresses of all editors anyway, so differential privacy doesn't sound like a huge deal there. If we want to be extra careful, we can log accessing the information.

Not what you need, but just to be sure you are aware, there is a event.mediawiki_user_blocks_change Hive table that comes from the mediawiki.user-blocks-change stream with this schema. You can at least use this to know when a user (range?) was blocked.

:)