Page MenuHomePhabricator

Analyse Nuke usage across Wikimedia projects
Closed, ResolvedPublicSpike

Description

Nuke can be used on both registered and unregistered users. Its behaviour for unregistered users may need to change. To help us make a decision we want to better understand how often it is used on unregistered users.

Our qualitative research so far has focused primarily on English Wikipedia editors, where unregistered Nuking is not common as they can't create main namespace pages. We'd like to know if there are other communities where we should target our research.

Questions

  • What percentage of Nuke actions are currently taken on unregistered vs registered users?
  • Are there any Wikimedia projects with a particularly high rate of unregistered Nuke deletions?

Event Timeline

KCVelaga_WMF changed the task status from Open to In Progress.Jul 17 2023, 7:12 AM
KCVelaga_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

@Samwalton9 Sorry, this took a bit longer than excepted. Although I gathered all-time logs, given the scope of the task i.e. to analyse nuke usage to inform changes required due to IP masking, only recent records (last 3 years) were considered for the analysis, which included ~240000+ nuke actions across wikis. My reasoning is that if there has been no usage of the feature at all during the last three years, it won't be helpful to identify wikis to talk to, and also 3 years is a reasonable amount of time to understand the distributions.

Here are the findings:

Across wikis, a considerable amount of nuke actions (~33%) are taken on pages created by unregistered users (note: this includes pages in both content and non-content namespaces).

Among the bigger wikis (according the overall size rank from wiki comparison data), here are the top 10 wikis having more than 25% of nuke actions against unregistered users (with at least 300 actions in total during the last three years)

database codeIPusertotalIP_percentageuser_percentageoverall size rank
jawiki576391071487038.7661.243
dewiki679782146146.4853.524
ruwiki9252469339427.2572.756
itwiki10601664272438.9161.099
wikidatawiki9261173252658634.8365.1711
nlwiki36116252369.0230.9816
trwiki793706149952.947.117
ukwiki41051592544.3255.6818
enwiktionary19962144414048.2151.7919
kowiki758025451012574.8625.1420

Overall, the following are the top 20 wikis with the highest percentage of nuke actions against unregistered users (with least 300 actions in total during the last three years)

database codeIPusertotalIP_percentageuser_percentageoverall size rank
thwikiquote4190419100.00.0442
tnwiki471247399.580.42420
gotwiki447445199.110.89580
stwiki668767598.961.04474
satwiki111325113897.82.2348
sowiki6175266992.237.77169
fiwiktionary93197102890.569.44120
ltwiki3353837389.8110.1950
jawikiversity5316759888.811.2541
jawikibooks2663362302588.0311.9795
scowiki3135036386.2313.77135
aswiki51510461983.216.8121
huwiki1446394184078.5921.4126
jawiktionary3202882408478.421.685
bgwiki35810145978.022.038
enwikivoyage43213056276.8723.1355
fiwiki73022195176.7623.2428
kowiki758025451012574.8625.1420
jawikisource53020973971.7228.28177
simplewiki37581513527171.328.731

I don't have very direct answers to the questions mentioned in the task description, but I believe this data should be helpful for you to answer them.


  • If you'd like the data for all wikis, I have published the file on Github here
  • For the approach, the code and the analysis, the notebook is here

Thanks @KCVelaga_WMF, this is great.

It looks like the parent task is still worth considering, since usage of Nuke on IP addresses is relatively common.

Additionally, it seems that the following wikis would be good candidates for followup investigation, selecting the ones with a good mix of size and Nuke usage:

  • jawiki
  • wikidata
  • enwiktionary
  • kowiki

Yes, I agree that there is substantial usage of Nuke on unregistered users.

The wikis to follow up look good to me.