Page MenuHomePhabricator

Analyze effect of huwiki FlaggedRevs configuration change on problematic edits and new user retention
Closed, ResolvedPublic

Description

On 2018-04-09 FlaggedRevs on Hungarian Wikipedia was reconfigured: before, articles were displayed to readers in their last reviewed version, afterwards articles were always shown in their latest version and FlaggedRevs was only used to coordinate patrolling activity. Now that the test period for the change is over, we need to evaluate the effect and decide whether to keep the change.

The main questions to answer:

  • did the ratio of problematic edits increase after the change?
  • was new editor retention affected?

The secondary question would be to quantify how often readers saw bad changes due to the new configuration, and how often they didn't see new changes due to the old configuration.

Event Timeline

Tgr created this task.Nov 11 2018, 4:10 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 11 2018, 4:10 AM
Zache added a subscriber: Zache.Nov 11 2018, 11:58 AM

@Tgr: Who to analyze this?

Tgr added a comment.Nov 19 2018, 8:26 AM

@Tgr: Who to analyze this?

The huwiki community (help is of course welcome if anyone is interested :).

Tgr added a comment.Nov 19 2018, 8:42 AM

Registrations:

new registrations, dailyrunning average, 45 daysdiff in running average

(with apologies to everyone who actually knows how to do statistics)

Zache added a comment.Nov 19 2018, 8:27 PM

The 12 month running median is very stable for huwiki. There slight increase in monthly >5 trend after configuration change but in other ways it is almost flat.

Source

Samat added a subscriber: Samat.Nov 24 2018, 11:17 PM
Tgr added a subscriber: Halfak.May 20 2019, 11:33 AM

@Halfak said he has some ideas for this (thanks!)

Tgr added a comment.May 20 2019, 11:34 AM

One thing I would like to do here is an analysis of the number of vandal edits (as estimated by ORES), but I'd like to have more confidence in ORES prediction quality first.

Zache added a comment.May 20 2019, 1:17 PM

I would like to understand what i am actually looking here: (ie in one year huwiki's "active anonymous editors" editing level raised to same level than in fiwiki. Before that they were editing with accounts and now they are making edits without logging in?

https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/active-editors/normal|line|all|editor_type~anonymous*group-bot*name-bot*user|monthly

Tgr added a comment.May 21 2019, 9:04 PM

I'd expect a drop in active logged-in editors in that case, so this does seems like a real increase in contributors. OTOH if you look at editors (https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|editor_type~anonymous*user|monthly|editors - "active editor" means 5+ edits last month in the wikistats vocabulary, "editor" means 1+ edits last month), they have started growing significantly sooner (although there seems to be an uptick when flagrev was disabled? hard to tell by the naked eye), so there seems to be something else going on there and it's not obvious to what extent is the flagrev switch the cause.

That said I would be mainly interested in changes to the number of new users (more anon edits is not trivially a good thing, although ORES might help in better evaluating it), which shows no obvious change, but new registrations tend to be dominated by users who never edit (and probably never meant to edit in the first place) and that might drown out the signal so we probably need some filtering there to get a meaningful graph.

Zache added a comment.EditedMay 24 2019, 7:15 AM

I updated the flagged revs reviews stats graphs and fiwiki as reference.

Note for fiwiki bot reviews arent filttered out and ~10% of manual reviews of 2018-> are actually made by SeulojaBot (based on ORES and other rules)

Tgr added a comment.EditedJun 20 2019, 7:50 PM

(Note to self because someone asked and I had to search for it: the original enabling was in 2009 June 2008 Nov: T17568#198689.)

Tgr added a comment.EditedJun 20 2019, 7:52 PM

Users with 5-24 edits are up too, that's encouraging (unless that stat somehow includes anons): https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|activity_level~5..24-edits|monthly

Tgr added a comment.Jul 1 2019, 8:41 AM

I checked the registration year of users with 5/25+ edits in the last 30 days to get an estimate of how much these new editors are converted into long-term productive editors (5+, 25+) but didn't see any clear trend. (The numbers are too small and noisy for a graph to be useful.)

Samat added a comment.Jul 15 2019, 9:57 PM

I started to prepare a page with a statistical analysis here (in Hungarian only, yet, but mainly with easy to understand figures):
https://hu.wikipedia.org/wiki/Wikip%C3%A9dia:Jel%C3%B6lt_lapv%C3%A1ltozatok/Statisztik%C3%A1k
I will fill up with the main conclusions in the following days (most of them are obvious from the graphs anyway).

Zache added a comment.Aug 4 2019, 10:03 AM

@Samat Is there any comments from community to the results?

Zache awarded a token.Aug 4 2019, 10:04 AM
Samat added a comment.EditedAug 5 2019, 11:11 PM

@Zache not really yet...

Additionally to the editor&edit numbers, we tried to analyze the vandal/damaging ratio inside the increased number of casual and anon editors in order to have a better picture, but the results are not clear (based on the old ORES judging). We are waiting for the ORES update (T228078), hoping it gives us more information about this aspect of the change.

After that, I will try to close the open issue with a community vote.

Tgr added a comment.Sat, Sep 14, 1:09 PM

Here are the ORES stats:

totalsdamaging ratiogoodfaith ratio
anon edits
sourcesourcesource
non-anon edits
sourcesourcesource

(Non-anon edits only show maybebad ratios due to huge scale differences, the source has the rest. Dotted lines show the data from previous year.)

In short:

  • ~33% increase in number of anon edits
  • no change in number of non-anon edits
  • slight increase in ratio of bad anon edits (e.g. goodfaith/likelybad up from ~11% to ~13%, damaging/likelybad up from ~32% to ~35% - those two are roughly the same ratio as our anecdotal estimates of vandalism and edits needing fixup, respectively)
  • a similar increase in the ratio of bad non-anon edits, surprisingly
  • for anon edits, total impact on patrollers (from more anon edits + more frequent bad anon edits) is something like 500 vandal edits, 800 well-intentioned problematic edits monthly
  • conversely, useful anonymous edits have increased by about 2000 per month
Tgr added a comment.Sun, Sep 15, 8:50 AM

I checked the registration year of users with 5/25+ edits in the last 30 days to get an estimate of how much these new editors are converted into long-term productive editors (5+, 25+) but didn't see any clear trend. (The numbers are too small and noisy for a graph to be useful.)

Quarterly numbers graph more nicely:

5+25+
sourcesource

So no significant change. (The big spike is in 2018 February, well before the config change, and seems to have something to do with a large number of global bots being registered, probably due to some SUL-related change. The spike at the very end is an artifact of the metric - there are many more freshly registered 5+ editors than 5+ editors who have survived multiple months.)

Users with 5-24 edits are up too, that's encouraging (unless that stat somehow includes anons): https://stats.wikimedia.org/v2/#/hu.wikipedia.org/contributing/editors/normal|line|all|activity_level~5..24-edits|monthly

It does include anons, confusingly: https://meta.wikimedia.org/wiki/Research_talk:Wikistats_metrics/Editors
So either the FlagRev change significantly increased the number of anons who return regularly to make edits (as much as 25+ per month), or, more plausibly, this is an artifact of shared IPs (not really used in Hungary for residential internet connections I think, but could be school/workplace IPs).

Tgr added a comment.Sun, Sep 15, 2:53 PM

Number of registrations per month where the user reached a certain number of edits within 30 days (a proxy for productive new editors - we know new editors who make a larger number of edit in their first month are much more likely to become long-term editors):

5+25+100+
sourcesourcesource

(trendlines are for 12 months, dotted lines / thin trendlines are same curve shifted by 12 months)

Again no obvious effect (although given the noisyness of the data it would have to be a very large effect to jump out in the graph).

Tgr closed this task as Resolved.Sun, Sep 15, 2:59 PM
Tgr claimed this task.

The main questions to answer:

  • did the ratio of problematic edits increase after the change?

Yes, but very little. (The number of problematic edits did increase, though, since we have a lot more anonymous edits now.)
Interestingly for non-anonymous edits the increase was much larger, but those edits are far too few in absolute numbers to matter.

  • was new editor retention affected?

As far as we can tell, not at all.

This concludes the analysis (although I'd still be interested in what other statistics would be worth looking at).

One note I want to make is about patroller workload. When considering how much work a patroller needs to do, the % of vandalism is kind of irrelevant. We should instead look at the number of edits that need to be reviewed in order to catch most of the vandalism. Without something like ORES in place, this is really just a factor of the total number of edits. With ORES in place, this would be the % of edits that cross some useful threshold of "probability" of damaging. E.g. https://ores.wikimedia.org/v3/scores/huwiki/?models=damaging&model_info=statistics.thresholds.true."maximum%20filter_rate%20@%20recall%20>=%200.90" suggests that any edit that scores above 0.061 should be reviewed and that using this threshold will reduce patrollers' overall workload by 92%. It looks like you're getting at this in your plots from T209224#5493029. It would be nice to see that as a raw number. I can help run a statistical test to see if the differences we see are significant.

I need p -- the proportion of edits that would be flagged as needing review before and after the change and N the total number of edits before and after the change. If it's a pain to apply the threshold I came up with, the "May have problems" threshold is pretty close to what I have in mind.

Zache added a comment.Tue, Sep 17, 5:24 AM

... We should instead look at the number of edits that need to be reviewed in order to catch most of the vandalism. Without something like ORES in place, this is really just a factor of the total number of edits. ...

With flaggedrevs this is balanced via autoreview user right. Generally, most of edits from the regular users are autoreviewed anyway so their edits should be excluded from the effect.