Page MenuHomePhabricator

Generate reports on page protection statistics
Closed, ResolvedPublic2 Story Points

Description

Please generate reports containing the following 6 data points for English, German, French, Spanish, and Italian Wikipedias.

  • #1 How many pages are currently edit protected? — single numerical values
    • regardless of date set or expiration date
    • namespace 0 only
    • one data point for full protection — so only admins can edit
    • one data point for semi-protection — so only confirmed or auto-confirmed users can edit
    • one data point for extended protection — so only extendedconfirmed users can edit (English Wikipedia only)
  • #2 How many edit protections were set this week? — single numerical values
    • namespace 0 only
    • can be any 7-day period, please provide the exact dates used.
    • one data point for full protection
    • one data point for semi-protection
    • one data point for extended protection (English Wikipedia only)
  • #3 What is a distribution of the expiration dates set this week? — distributions, can be capped
    • namespace 0 only
    • can be any 7-day period, please provide the exact dates used.
    • one distribution table for full protection
    • one distribution table for semi-protection

Tech-Ref: https://www.mediawiki.org/wiki/Manual:Page_restrictions_table
See also: https://en.wikipedia.org/wiki/Special:ProtectedPages
And also: https://en.wikipedia.org/wiki/Special:Log?type=protect

Event Timeline

Restricted Application added subscribers: MGChecker, Aklapper. · View Herald TranscriptMay 20 2018, 1:34 PM
SPoore added a subscriber: SPoore.May 21 2018, 3:10 PM

@TBolliger Probably only want to look at full protection for the information we are trying to obtain. https://meta.wikimedia.org/wiki/Help:Protection

TBolliger updated the task description. (Show Details)Jun 6 2018, 5:55 PM
TBolliger updated the task description. (Show Details)Jun 6 2018, 6:23 PM
dmaza updated the task description. (Show Details)Jun 6 2018, 6:25 PM
TBolliger updated the task description. (Show Details)Jun 6 2018, 6:27 PM
TBolliger updated the task description. (Show Details)Jun 6 2018, 6:29 PM
dbarratt set the point value for this task to 2.Jun 6 2018, 6:30 PM
Vvjjkkii renamed this task from Generate reports on page protection statistics to 1kcaaaaaaa.Jul 1 2018, 1:09 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from 1kcaaaaaaa to Generate reports on page protection statistics.Jul 1 2018, 4:46 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot set the point value for this task to 2.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
TBolliger updated the task description. (Show Details)Sep 24 2018, 10:12 PM
TBolliger updated the task description. (Show Details)Sep 24 2018, 10:17 PM

Something else to think about — probably for a different task — is how can we measure protections that are modified? How often does page protection escalate from none to semi and then from semi to full. On English Wikipedia there is also extended between semi & full.

We could also determine the amount of time between these escalations, the number of edits, the topics of these pages, etc. There's a lot to explore.

aezell added a subscriber: aezell.EditedOct 16 2018, 2:56 PM

Way out of scope for this task but this kind of data and the slices you want to look at would be PERFECT for an ELK stack. You could even have a dashboard that showed these numbers in real-time.

Specific technology aside, it does look like we could do this in a way that is repeatable and mildly configurable by an end user of the data: https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Dashboards

aezell claimed this task.Oct 16 2018, 3:33 PM
aezell moved this task from Ready to In progress on the Anti-Harassment (AHT Sprint 31) board.

Way out of scope for this task but this kind of data and the slices you want to look at would be PERFECT for an ELK[1] stack. You could even have a dashboard that showed these numbers in real-time.
Specific technology aside, it does look like we could do this in a way that is repeatable and mildly configurable by an end user of the data: https://wikitech.wikimedia.org/wiki/Analytics/Tutorials/Dashboards

That would probably be in scope for T206023: AHT: Implement weekly reports about Page Protection usage!

Yes, this is exactly the presentation I was expecting. I was anticipating more weekly protections, but it sounds reasonable. Would appreciate @SPoore to double check.

I looked at the protection log on Italian Wikipedia and made this rudimentary spreadsheet to count protection actions by namespace. It shows 93 mainspace actions and your data shows 70. I'll do a little more digging to see why there's a difference of 23 and post another comment.

I can't quite figure out exactly what's wrong, but my best guess is that your script is missing protections for full protection (admin) with indefinite expiration. Your spreadsheet only shows 2 full protection and 68 semi, while mine shows 42/49. Different namespace could also be at play.

Here are the other potential factors:

  • create vs. edit/protect
  • admin vs autoconfirmed (full/semi)
  • expiration
  • duplicate pagenames
  • namespace

I just looked at some of the raw data instead of the counts and I need to redo these queries. It's definitely not correct.

Let me work on it a bit more.

Protections set this week

From October 11 to October 18, this many page protections were set:

WikiFullSemiExtendedTime Frame
dewiki121010This Week
enwiki2241718This Week
eswiki1700This Week
frwiki1490This Week
itwiki1310This Week

Active protections

As of today, this many pages are currently protected:

WikiFullSemiExtendedTime Frame
dewiki12922570Total
enwiki2012113051194Total
eswiki1339100Total
frwiki706260Total
itwiki272330Total

Distributions of protection expirations

For all the wikis we generated data on, there was a high cluster of protections for a month or shorter, then clusters 6, 12, and 24 months later. Not many were of indefinite length.

On English Wikipedia most page edit protections (67%) are for the 30 days are shorter, but you can see a small patch that are 6 months, then 12 months, then 2 years, then 3 years. And only 8% are indefinite length


Other data:

  • only 0.25% of all content pages on ENWP are protected right now, 0.11% for German Wikipedia
TBolliger closed this task as Resolved.Oct 18 2018, 7:49 PM
TBolliger moved this task from In progress to Done on the Anti-Harassment (AHT Sprint 31) board.