Page MenuHomePhabricator

Revert rate user profiling
Closed, ResolvedPublic

Description

@Charlotte analysis project

Revert rate user profiling:

  1. How many and what % of users who have edited on SE using app versions released 25 November 2019 and later are in each percentage bucket of revert rate (e.g., 0-1%, 1-2%, etc up to 7+%)

a. How many and what % of users have ever been temporarily suspended?
b. How many and what % of users are now blocked from the feature?

  1. We see from this report that the baseline revert rate for image caption edits has increased since late November. What % of that increase is driven by people who are new to using SE - meaning, those whose first SE edit is after 25 November 2019, using a version of the app released after that date? (Basically, are the n00bs we're attracting qualitatively worse than the old people?)

The ultimate idea here is to try to figure out whether to reset the quality thresholds lower - and if so, to what level. We know from the Commons folks that it takes a lot for an edit to actually get reverted, so our thresholds probably need to be more sensitive. Since we didn't set them using very robust data in the first place (for reasons I explained on our call), now's the opportunity to recalibrate.

Event Timeline

SNowick_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
SNowick_WMF added subscribers: mpopov, kzimmerman.
SNowick_WMF added a subscriber: Dbrant.

Issues with incomplete data:
We can approximate with available data to get a better understanding of revert rate distribution among users (by using registration date) but after investigating I have found that we don't have a way to explicitly track revisions or reverts made on Commons by app version. In order to do this we would need to add tracking of the revision id generated by Commons to the same MobileWikiAppEdit table that tracks app_install_id, app version number and revision ids for edits outside of Commons. See (1.) below for a more detailed explanation.

Analyzing Commons reverts based on a registration date later than 11/26/2019 doesn't guarantee that the user is on the new version. We can reasonably guess that the further out they are from 11/26 the more likely these are new app users but it's a much smaller subset of the overall reverts and would not include older users who have upgraded and may still encounter the non-gatekept SuggestedEdits for the first time.

Reporting for this newly registered cohort and additional reporting on the rest of the reverted users before and after the upgrade/change will be entered here but I wanted to let you know the issues I encountered before that is submitted. Because I pulled all the data for all reverts before I found that Commons was missing I do have data on new app version users revert rates on other wikis, I can include topline numbers on those as well.

Additionally:
For these questions I am not able to report:
a. How many and what % of users have ever been temporarily suspended?
b. How many and what % of users are now blocked from the feature?
I have confirmed w @Dbrant that the suspensions, blocks and steps prior to blocks being implemented are not tracked in events, the blocks are applied based on user's revert rate but it's done in-app and there is no trail to indicate that event on our dbs. We should discuss how and if we want to track this going forward.

  1. For regular edits the revision id (rev_id) is written in the MobileWikiAppEdit event log along with app install_id, app version, time of event, etc. Revision ids are then checked against mediawiki_historyto see if they have been reverted as indicated by revision_is_identity_revert="True". The revision ids for Commons are stored in commonswiki and do not include an app_install_id or app version. They do include event_user_id and event_user_text which we can cross check in mediawiki_history to get event_user_registration_timestamp and event_user_revision_count

Thanks, @SNowick_WMF, for surfacing these issues.

Issues with incomplete data:
We can approximate with available data to get a better understanding of revert rate distribution among users (by using registration date) but after investigating I have found that we don't have a way to explicitly track revisions or reverts made on Commons by app version. In order to do this we would need to add tracking of the revision id generated by Commons to the same MobileWikiAppEdit table that tracks app_install_id, app version number and revision ids for edits outside of Commons. See (1.) below for a more detailed explanation.

Let's do this.

Analyzing Commons reverts based on a registration date later than 11/26/2019 doesn't guarantee that the user is on the new version. We can reasonably guess that the further out they are from 11/26 the more likely these are new app users but it's a much smaller subset of the overall reverts and would not include older users who have upgraded and may still encounter the non-gatekept SuggestedEdits for the first time.

Yes, which is unfortunate since it muddies the picture somewhat. Nevertheless, it's probably better than nothing, if we acknowledge the data's limitations whilst drawing conclusions.

Additionally:
For these questions I am not able to report:
a. How many and what % of users have ever been temporarily suspended?
b. How many and what % of users are now blocked from the feature?
I have confirmed w @Dbrant that the suspensions, blocks and steps prior to blocks being implemented are not tracked in events, the blocks are applied based on user's revert rate but it's done in-app and there is no trail to indicate that event on our dbs. We should discuss how and if we want to track this going forward.

Yes please, otherwise we're essentially flying blind regarding whether the safeguards we have put in place are actually effective or not.

Link to report

Initial reporting compares new registered users vs all users in app, editing on Commons.

Decision on how to measure blocks and suspensions needs to be discussed w/engineering. Retention rate child ticket will be next, after data for Quarterly Insights is ready.

SNowick_WMF changed the task status from Open to Stalled.Mar 3 2020, 12:13 AM

Thanks @SNowick_WMF - but I have to confess I'm not at all sure how to interpret the report. If essentially 100% of users who have registered accounts and edited using SE after 26 November 2019 are in the 8%+ revert rate bucket, that means they are all likely to be locked out of the feature. That seems... strange to me. Can you explain?

Likewise, I'm interested in seeing those with a very high revert rate as a percentage of *all* editors, not just of the new editors. Perhaps the report really does say that it is completely useless to attract brand new editors using SE because they're all spammers... but somehow I'm having a hard time squaring that with the usage data for the feature.

My initial response to this data was much the same as yours which is why
this took extra time because I re-queried several different ways just to
make sure that this was right. I think looking at Commons users separate
from all editors on the app distorts this somewhat, and I’m not sure these
users are actually locked out since we can’t verify it (and the users may
not even know themselves). I will add the rest of the ‘whole user’ data I
have. Until we can verify by version number I would consider this a first
glance that indicates that some gatekeeping may be helpful to keep users
from getting blocked out with one mistake (56% of the new users had one
edit and one revert, another 10% had 2 edits, one revert making them high
revert raters right when they start. More info to follow.

This is resolved, to revisit these findings when we are recalibrating revert rate blocks and suspensions in upcoming versions