Page MenuHomePhabricator

Query on anonymous editing in the User: namespace on
Closed, ResolvedPublic


I'm looking for some data on the general frequency in which anonymous edits are made to the User: namespace on the English Wikipedia, and how often they are reverted. Outside of the occasional edit made when logged out by accident, these edits are generally not productive and I'd like to know what kind of magnitude we are looking at. This is also relevant to issues of vandalism and harassment, which is known to occur on user pages from anonymous editors looking to provoke another person.

Not sure if it's easier to break these numbers down by year, but I don't think a month-to-month comparative analysis is necessary. Just looking to get a sense of the general frequency. Thanks!

Event Timeline

OK so I think this can be done relatively easily using the productivity measurements I'm working on. We'll be able to include edits that don't really stick with the reverted edits. I think we should split edits to User by "obviously good" and "obviously bad" and spot-check the obviously good edits for accidental logged-out edits. Then we'll have a ballpark answer.

I_JethroBT triaged this task as Medium priority.Aug 12 2016, 2:33 AM

@Halfak - Any idea on when this query might be able to be done? An RfC related to this data is being developed and the proposer is interested in getting this discussion started soon:

I commented at

The task description stating "these edits [by IP users] are generally not productive" is unsubstantiated and assumes bad faith. :-(

@I_JethroBT, regretfully, I've found that my productivity measurements are limited to namespace zero. I think I'll be retrying this analysis -- looking at revert rates instead. I should have an update by the end of the day.

OK. new proposal re. methods. I think that we should gather a random sample of edits to NS=2 by anons. Then we should run them through the revert checked in mwreverts to see if they were reverted for reasons that are likely to be vandalism or damage. Generally, I limit the time to revert to 48 hours, exclude self-reverts, and exclude edits that are reverted back to by someone other than the original author.

Then I think we should manually review a random sample of reverted/not-reverted edits to look for evidence for or against:

  1. Hypothesis 1: Most good anon user space edits are accidental logged-out edits
  2. Hypothesis 2: Most anon user space edits are vandalism

Sound OK?

The age of the owner of the user page would be an interesting point to note, too.

I foresee three cases:

  • Page of experienced editor. All anon edits are vandalism, there may be an occasional logged-out edit by the owner (for which getting a you-are-logged-out error wouldn't be an issue).
  • Page of "newbie editor". He doesn't care too much about logging into his account. Maybe he isn't using his userpage properly, perhaps preparing an article there.
  • Page of non-existant user being edited.

Also note the difference between User pages, and User subpages. There is no reason to vandalize a subpage (unless it's transcluded by the user page). But certain subpages may expect anon edits...

@Platonides, good point! I'm glad you brought up the issue of sub-pages. It looks like 1584 out of the 2921 anon, user space edits in the recentchanges table for enwiki are subpage edits. Here's a quick sample of those.

It seems to me that these edits are all non-vandalism and they are likely to be accidental logged out edits.

So, I think I'll be focusing on edits to non-subpages. See my work in Quarry here: -- "Exploring anonymous activity in User space (enwiki)"

Thanks for all this work so far, @Halfak : )

@Platonides brings up a good point above about cases regarding subpages, and it looks like the sample supports the notion that there is not a big problem on user subpages.

I also agree with @Platonides that it's reasonable to predict that newer / older / nonexistent userpages may be different in degrees of productive and non-productive editing. @Halfak, I think the hypotheses you are looking to test make sense. If it is too difficult to gather productivity measurements, I think reverts will provide enough of an indication.

I've gathered a random sample of recent edits within the "User" namespace on English Wikipedia that were not to sub-pages and whose IP address did not associate with any logged in changes (using the checkuser system). This is a random sub-sample of the roughly 60% of anon user-space edits that didn't match a registered user. @I_JethroBT and I manually reviewed these edits and sorted them into three classes: good, good-faith (but damaging) and vandalism.

It looks like the majority of the edits were vandalism (54). There were a few good-faith mistakes (6). The remaining good edits look like they were mostly edits by a logged out user that snuck through my filters (43) and a couple that were counter-vandalism (2).

See our notes below:

Good edits (45):

Good faith damage (6):

Vandalism (54):