Page MenuHomePhabricator

👩‍👧‍👦 Measure the effectiveness of blocks
Closed, ResolvedPublic

Description

This is a parent task for the Anti-Harassment Tools team (including @nettrom_WMF) to measure the effectiveness of sitewide and partial blocks at stopping harm to Wikimedia wikis.

More information can be found at https://meta.wikimedia.org/wiki/Community_health_initiative/Measuring_the_effectiveness_of_blocks


Data points to gather

Number of users who

  • receive a sitewide block
  • receive a sitewide, non-indefinite block
  • have a sitewide block which expires
  • have a sitewide block which expires, then do not receive another sitewide block
  • have a sitewide block which expires, then do not receive another sitewide block, who make 1+ edit
  • receive a partial block
  • have a partial block who make 1+ edit
  • have a partial block who do not receive a sitewide block OR have pages added to their partial block OR have their expiration date extended

Number of pages which

  • are protected OR have their protection level escalated

This data should be available in the Data Lake on a monthly basis in the logging tables. We'll also have T209549: Add ipblocks_restrictions table to Data Lake if needed

Event Timeline

TBolliger created this task.

I've created a GitHub repo where I'll put notebooks and graphs for analysis: https://github.com/nettrom/AHT-block-effectiveness-2018

@nettrom_WMF Are you using IPython alone or within Jupyter?

@Niharika : I picked this up again last week. At this point, I'd like to wait until partial block data is in the Data Lake to continue the work, because then I'll get block duration and edit revert detection for free rather than handle those myself. It would also be great to have IP blocks in the Data Lake, because so far a lot of the partial blocks are of IPs.

In other words, I'd prefer to wait until T211950 and T211627 are completed. Let me know if that's a problem.

@Niharika : I picked this up again last week. At this point, I'd like to wait until partial block data is in the Data Lake to continue the work, because then I'll get block duration and edit revert detection for free rather than handle those myself. It would also be great to have IP blocks in the Data Lake, because so far a lot of the partial blocks are of IPs.

In other words, I'd prefer to wait until T211950 and T211627 are completed. Let me know if that's a problem.

That sounds good to me, @nettrom_WMF. Do you know who's responsible for those tasks and what the ETA on those being completed might look like?

@Niharika : I'm not sure who on the Analytics Engineering team is responsible, and I noticed that neither of the tasks are assigned to anyone. My current understanding is that these changes are likely to arrive with the next snapshot, which should be available in a few days, or the one after that (in early May).

@Niharika can we call this particular task resolved, since you've reported on results at Wikimania & in the Year in Review?

@Niharika can we call this particular task resolved, since you've reported on results at Wikimania & in the Year in Review?

Yes, thank you!