Motivation
We want to collect data to understand the pervasiveness and impact of blocks on our platform.
Some data questions we have in mind are:
- How many IPs are blocked on our wikis? How many users are blocked?
- How has this changed over time? Hard blocks vs soft blocks?
- How many get unblocked, and how long are the blocks? (Does the length of the block match how long the proxies are being abused?)
- Of all IPs in the world, what % are blocked? Which countries are blocked more than others? Which communities/wikis are blocked more than others?
- How many people report being blocked & ask for exemptions? How many organizers? How many exemptions get granted?
- How many LTAs are we blocking?
- How many IPs are getting blocked because of how many LTAs?
- How many VPNs / private networks are blocked?
- How many bots have been spun up to do the blocks? On which wikis?
- How many IP edits are published? How many reverted?
- How does this change per project? Over time?
- How many people are we losing because of VPN/proxy blocks? Logged-in people?
- How many people who can't edit as an IP make an account?
- Impact on stewards, UTRS, other queues?
- How many people start as IP editors?
- What is the current user flow like for IP edits? What might the user flow look like if IP edits are eliminated?
- Is it different for different wikis? (For some projects, the main source of logged-in editors is people who started editing at one of the big Wikipedias. For others, the main source of editors might be IPs who decided to create an account.)
- Do we know enough about the difference between editors who started by creating an account vs editors whose first edit was logged out?
- How many admins and moderators are available? What is their bandwidth like? What kind of qualitative information can we get about their experiences?
- What kind of bot support is available to flag or revert potentially malicious edits?
- In which languages does it work? Are local/fluent speakers able to adjust it?
- At wikis that require registration, what will happen to ORES, which relies heavily (40%?) on the fact that someone is logged out to determine whether an edit is vandalism?
This is not an exhaustive list. We will add more questions to this list as they come up.
The goal for this ticket is to find which data questions we can answer, which ones we cannot and which ones need additional instrumentation to answer.