Page MenuHomePhabricator

Explore data questions around IP blocks
Open, MediumPublic

Description

Motivation

We want to collect data to understand the pervasiveness and impact of blocks on our platform.

Some data questions we have in mind are:

  • How many IPs are blocked on our wikis? How many users are blocked?
  • How has this changed over time? Hard blocks vs soft blocks?
  • How many get unblocked, and how long are the blocks? (Does the length of the block match how long the proxies are being abused?)
  • Of all IPs in the world, what % are blocked? Which countries are blocked more than others? Which communities/wikis are blocked more than others?
  • How many people report being blocked & ask for exemptions? How many organizers? How many exemptions get granted?
  • How many LTAs are we blocking?
    • How many IPs are getting blocked because of how many LTAs?
  • How many VPNs / private networks are blocked?
  • How many bots have been spun up to do the blocks? On which wikis?
  • How many IP edits are published? How many reverted?
    • How does this change per project? Over time?
  • How many people are we losing because of VPN/proxy blocks? Logged-in people?
  • How many people who can't edit as an IP make an account?
  • Impact on stewards, UTRS, other queues?
  • How many people start as IP editors?
  • What is the current user flow like for IP edits? What might the user flow look like if IP edits are eliminated?
    • Is it different for different wikis? (For some projects, the main source of logged-in editors is people who started editing at one of the big Wikipedias. For others, the main source of editors might be IPs who decided to create an account.)
  • Do we know enough about the difference between editors who started by creating an account vs editors whose first edit was logged out?
  • How many admins and moderators are available? What is their bandwidth like? What kind of qualitative information can we get about their experiences?
  • What kind of bot support is available to flag or revert potentially malicious edits?
  • In which languages does it work? Are local/fluent speakers able to adjust it?
  • At wikis that require registration, what will happen to ORES, which relies heavily (40%?) on the fact that someone is logged out to determine whether an edit is vandalism?

This is not an exhaustive list. We will add more questions to this list as they come up.

The goal for this ticket is to find which data questions we can answer, which ones we cannot and which ones need additional instrumentation to answer.

Event Timeline

Niharika triaged this task as Medium priority.

After reviewed the question list, we identified the questions that we already have or could have data answers. The other questions are great questions. But we don't have quantitative data to answer them yet,

What we already have

that following questions already had data available on existing dashboards or analysis reports.

  • How many IPs are blocked on our wikis? How many users are blocked? How has this changed over time?

    Available at IP masking dashboard
What we could have

One question we could have quantitative answer from available schema is

  • How hard blocks vs soft blocks?

Schema wmf_raw.mediawiki_ipblocks might have recorded related info.

Other questions we could have answer

Recently, Growth team and Editing team deployed instrumentations to track blocked account creations and blocked edit attempts. (T306018, T310390) With the instrumentations, we will be able to explore questions like:

  • Account creation related:
    • What is the geographic distribution of blocked account attempts?
    • How does the geographic distribution of blocked account attempts compare to successful account registrations?
    • Are we able to distinguish between “human” and “automated” account creation requests when it comes to blocked attempts?
  • Edit attempt related:
    • To what extent do blocks play a role in preventing potentially productive editors from making an edit?
    • What countries do blocked edit attempts most frequently occur?
    • What are the differences in frequency of blocked edit attempts by wiki, editing interface, and platform?
    • What types of IP blocks (local/global, short-/long-term) are more frequently encountered?
    • How many distinct users are stopped from editing by a block?

Have created below tickets to track the next step analyses.

  1. Explore measurement of hard blocks and soft blocks. T322679
  2. Analyze blocked account creations. T322680
  3. Analyze blocked edit attempts. T322682