Page MenuHomePhabricator

Generate statistics on blocking usage
Closed, ResolvedPublic8 Estimated Story Points

Assigned To
Authored By
TBolliger
Mar 21 2018, 7:22 PM
Referenced Files
F18101053: all_wiki.csv
May 9 2018, 4:21 PM
F18101055: unblocks.csv
May 9 2018, 4:21 PM
F18101057: blocks_per_wiki.csv
May 9 2018, 4:21 PM
F18101056: block_modifications.csv
May 9 2018, 4:21 PM
F18101054: ipboptions.csv
May 9 2018, 4:21 PM
F17601491: unblocks.csv
May 2 2018, 6:09 PM

Description

About

We know some things about blocking on Wikimedia sites but don't have any firm data to ground our decision making or understanding. None of the data listed below will lead to a direct decision but rather will inform our decision making as we continue to make changes to how blocks work.


Data to generate

.

For all wikis right now...

  • How many blocks are active right now? (IP + range + user, all wikis, regardless of expiration, regardless of duplication)

Per wiki, how is MediaWiki:Ipboptions used...

  • How many wikis have customized MediaWiki:Ipboptions?
  • On the wikis that have customized MediaWiki:Ipboption, what is the distribution of the most common block length options? (e.g. 500 wikis have 'indefinite' 500 have '1 day' 300 have '1 week' etc.)

Blocking. Per wiki, over the past 7 days...

  • How many blocks were set per hour, on average?
  • How many autoblocks are currently active?
  • How many blocks are currently active?
  • What is the distribution of block lengths?
  • What percentage of blocks are for IP address vs. IP range vs. usernames?
  • What are the most common block reasons?
  • How many blocks toggle 'Prevent this user from editing his own talk page while blocked' to checked (default is unchecked)

Manual unblocking. Per wiki, over the past 7 days...

  • Total manual unblocks?
  • What are the most common unblock reasons?
  • What percentage of user unblocks was by an admin vs. the original blocking admin?

Block modification. Per wiki, over the past 7 days...

  • How many blocks were modified per hour, on average? (Determined by seeing that one block is made for a user and another is set before the other block expires.)
  • What percentage of block modification was by an admin vs. the original blocking admin?

Could be retrieved by querying the database, logging table, API:Blocks or whatever was used in T180071: Generate report of admins who perform many blocks so we have a list to invite to blocking consultation

Maybe for later:

  • What percentage of blocks naturally expire vs. are manually unblocked?

Ticket source code is here: https://github.com/dayllanmaza/wikireplicas-reports

Event Timeline

Does any of the following provides useful data?

  • How often are blocks created in a wiki ? (avg time) Ex: 5 blocks per hour
  • How often are blocks removed
  • Common reasons for "unblocking"
  • Frequency of autoblocks?

What is the distribution of the most common block length options? (e.g. 500 wikis have 'indefinite' 500 have '1 day' 300 have '1 week' etc.)

Will this be in the past year? 7 days?

Does any of the following provides useful data?

  • How often are blocks created in a wiki ? (avg time) Ex: 5 blocks per hour
  • How often are blocks removed
  • Common reasons for "unblocking"
  • Frequency of autoblocks?

While we're in there — yes these would be good to have to paint a complete picture.

What is the distribution of the most common block length options? (e.g. 500 wikis have 'indefinite' 500 have '1 day' 300 have '1 week' etc.)

Will this be in the past year? 7 days?

This is specifically about what's defined in MediaWiki:Ipboptions, so it would be a one-time snapshot.

One suggestion:

  • How often is a block modified? (e.g. duration, talk page access on/off, updated summary)

Don't know how easy this is, and whether each type of modification needs its own query.

How often is a block modified/lifted by an admin other than the original blocking admin?

Great points. I've updated the ticket.

@dmaza I'm happy to help come up with mockup data tables to illustrate what type of results we're anticipating. "Per wiki, per hour, on average" can be interpreted in a few different ways.

TBolliger set the point value for this task to 8.

@TBolliger Sure, if you want we can wait 'til we pick it up in a sprint and talk it through. Looks straight forward to me tho

From WMCON, @Matanya thinks the average block will be 2hr with reason as 'vandal'. :)

We should also look into (probably in another ticket) determining the block rate for new users. What percentage of new editors are blocked? Does this value change over time?

What percentage of blocks naturally expire vs. are manually unblocked?

@TBolliger, This means that I'd need to get blocks created in the last 7 days that expired already or that were manually unblocked before NOW. It can be done but we'll exclude any block with a lifespan longer than 7 days. I'm not sure how useful this is.

Or we can do dumb math (unblocks ➗ blocks). Let's drop it for now, i'll update the ticket.

This comment was removed by dmaza.

I have collated all the data (sent via email) onto this Google sheets file (which is permissioned to certain WMF staff members).

Dayllan — can you please provide the exact date range(s) for each of these datasets? Thank you!

Dayllan — can you please provide the exact date range(s) for each of these datasets? Thank you!

April 30 - May 7 inclusive.

Below is the raw data generated from the queries.

@dmaza — I'm looking at this again and I'm skeptical there are only ~2,000 indefinite blocks on English Wikipedia of all time. According to Special:BlockList I see there were 255 infinite blocks just on May 9, 2018. Using basic math to extrapolate this to just one year, this is ~93,000 infinite blocks.

So, my understanding is the number in this data is for blocks set during the date range of April 30 to May 6, correct?

@TBolliger All the data is from April 30 to May 6.
If you need a different date range this can be easily adjusted and I can run them again.

OK, thank you for confirming. I think I was confused by the word "active" because it could either mean set within the daterange or all-time.

No need to generate anything else now. Thank you!