Generate statistics on blocking usage
Closed, ResolvedPublic8 Story Points

Description

About

We know some things about blocking on Wikimedia sites but don't have any firm data to ground our decision making or understanding. None of the data listed below will lead to a direct decision but rather will inform our decision making as we continue to make changes to how blocks work.


Data to generate

.

For all wikis right now...

  • How many blocks are active right now? (IP + range + user, all wikis, regardless of expiration, regardless of duplication)

Per wiki, how is MediaWiki:Ipboptions used...

  • How many wikis have customized MediaWiki:Ipboptions?
  • On the wikis that have customized MediaWiki:Ipboption, what is the distribution of the most common block length options? (e.g. 500 wikis have 'indefinite' 500 have '1 day' 300 have '1 week' etc.)

Blocking. Per wiki, over the past 7 days...

  • How many blocks were set per hour, on average?
  • How many autoblocks are currently active?
  • How many blocks are currently active?
  • What is the distribution of block lengths?
  • What percentage of blocks are for IP address vs. IP range vs. usernames?
  • What are the most common block reasons?
  • How many blocks toggle 'Prevent this user from editing his own talk page while blocked' to checked (default is unchecked)

Manual unblocking. Per wiki, over the past 7 days...

  • Total manual unblocks?
  • What are the most common unblock reasons?
  • What percentage of user unblocks was by an admin vs. the original blocking admin?

Block modification. Per wiki, over the past 7 days...

  • How many blocks were modified per hour, on average? (Determined by seeing that one block is made for a user and another is set before the other block expires.)
  • What percentage of block modification was by an admin vs. the original blocking admin?

Could be retrieved by querying the database, logging table, API:Blocks or whatever was used in T180071: Generate report of admins who perform many blocks so we have a list to invite to blocking consultation

Maybe for later:

  • What percentage of blocks naturally expire vs. are manually unblocked?

Ticket source code is here: https://github.com/dayllanmaza/wikireplicas-reports

Restricted Application added subscribers: MGChecker, Aklapper. · View Herald TranscriptMar 21 2018, 7:22 PM
dmaza added a subscriber: dmaza.Mar 21 2018, 8:11 PM

Does any of the following provides useful data?

  • How often are blocks created in a wiki ? (avg time) Ex: 5 blocks per hour
  • How often are blocks removed
  • Common reasons for "unblocking"
  • Frequency of autoblocks?

What is the distribution of the most common block length options? (e.g. 500 wikis have 'indefinite' 500 have '1 day' 300 have '1 week' etc.)

Will this be in the past year? 7 days?

Does any of the following provides useful data?

  • How often are blocks created in a wiki ? (avg time) Ex: 5 blocks per hour
  • How often are blocks removed
  • Common reasons for "unblocking"
  • Frequency of autoblocks?

While we're in there — yes these would be good to have to paint a complete picture.

What is the distribution of the most common block length options? (e.g. 500 wikis have 'indefinite' 500 have '1 day' 300 have '1 week' etc.)

Will this be in the past year? 7 days?

This is specifically about what's defined in MediaWiki:Ipboptions, so it would be a one-time snapshot.

TBolliger updated the task description. (Show Details)Mar 21 2018, 8:48 PM
TBolliger updated the task description. (Show Details)

One suggestion:

  • How often is a block modified? (e.g. duration, talk page access on/off, updated summary)

Don't know how easy this is, and whether each type of modification needs its own query.

How often is a block modified/lifted by an admin other than the original blocking admin?

TBolliger updated the task description. (Show Details)Mar 22 2018, 10:13 PM

Great points. I've updated the ticket.

@dmaza I'm happy to help come up with mockup data tables to illustrate what type of results we're anticipating. "Per wiki, per hour, on average" can be interpreted in a few different ways.

TBolliger updated the task description. (Show Details)Mar 23 2018, 6:17 PM
TBolliger set the point value for this task to 8.
dmaza added a comment.Mar 23 2018, 9:07 PM

@TBolliger Sure, if you want we can wait 'til we pick it up in a sprint and talk it through. Looks straight forward to me tho

dmaza claimed this task.Apr 20 2018, 6:01 PM
dmaza moved this task from Ready to In progress on the Anti-Harassment (AHT Sprint 19) board.

From WMCON, @Matanya thinks the average block will be 2hr with reason as 'vandal'. :)

We should also look into (probably in another ticket) determining the block rate for new users. What percentage of new editors are blocked? Does this value change over time?

SPoore added a subscriber: SPoore.Apr 26 2018, 6:09 PM
dmaza added a comment.Apr 30 2018, 9:52 PM

What percentage of blocks naturally expire vs. are manually unblocked?

@TBolliger, This means that I'd need to get blocks created in the last 7 days that expired already or that were manually unblocked before NOW. It can be done but we'll exclude any block with a lifespan longer than 7 days. I'm not sure how useful this is.

Or we can do dumb math (unblocks ➗ blocks). Let's drop it for now, i'll update the ticket.

TBolliger updated the task description. (Show Details)Apr 30 2018, 10:41 PM
dmaza updated the task description. (Show Details)Apr 30 2018, 11:41 PM
dmaza added a comment.May 2 2018, 6:09 PM
This comment was removed by dmaza.
dmaza updated the task description. (Show Details)May 2 2018, 6:12 PM
TBolliger updated the task description. (Show Details)May 2 2018, 6:30 PM
Stryn added a subscriber: Stryn.May 3 2018, 9:45 PM
dmaza updated the task description. (Show Details)May 3 2018, 11:48 PM

I have collated all the data (sent via email) onto this Google sheets file (which is permissioned to certain WMF staff members).

Dayllan — can you please provide the exact date range(s) for each of these datasets? Thank you!

dmaza added a comment.May 9 2018, 12:55 AM

Dayllan — can you please provide the exact date range(s) for each of these datasets? Thank you!

April 30 - May 7 inclusive.

dmaza added a comment.May 9 2018, 4:21 PM

Below is the raw data generated from the queries.

TBolliger closed this task as Resolved.May 9 2018, 6:41 PM

@dmaza — I'm looking at this again and I'm skeptical there are only ~2,000 indefinite blocks on English Wikipedia of all time. According to Special:BlockList I see there were 255 infinite blocks just on May 9, 2018. Using basic math to extrapolate this to just one year, this is ~93,000 infinite blocks.

So, my understanding is the number in this data is for blocks set during the date range of April 30 to May 6, correct?

dmaza added a comment.May 10 2018, 9:51 PM

@TBolliger All the data is from April 30 to May 6.
If you need a different date range this can be easily adjusted and I can run them again.

OK, thank you for confirming. I think I was confused by the word "active" because it could either mean set within the daterange or all-time.

No need to generate anything else now. Thank you!