Page MenuHomePhabricator

Prototype new models to facilitate sockpuppet detection
Open, HighPublic

Description

Prototype new models to facilitate sockpuppet detection.

This task is scheduled to start in Q2. The preparation for it starts in Q1.

Event Timeline

leila claimed this task.Jul 25 2017, 6:14 PM
leila created this task.
leila removed a project: Epic.Jul 25 2017, 6:31 PM
leila added a subscriber: TBolliger.

Requested a list of attendees and meeting for syncing up and aligning directions/expectations between Research, Scoring Platform, and Community Tech, prior to reaching out to potential external collaborators.

leila edited projects, added Research, Epic; removed Research-Programs.Jul 25 2017, 6:33 PM
DarTar added a comment.EditedAug 4 2017, 8:33 PM

Notes from our exploratory call with Srijan and multiple WMF teams on August 4: https://etherpad.wikimedia.org/p/sockpuppetdetection

leila added a comment.Aug 8 2017, 1:01 AM

Summary of results from the meeting: there's generally support for this research. Next steps:

  • Srijan and I will follow up to plan for the start of the research in Q2 (September-December). I'm not sure if this is possible on Srijan's end but we will figure it out in the coming weeks.
  • We will start by understanding the current workflow for detecting sockpuppet accounts.
leila added a comment.Aug 8 2017, 3:49 PM

Srijan says:
"I am already working with Tilen, a visiting PhD student (just like I once myself was :)), on an algorithm to identify bad users in any platform, including Wikipedia. Initial experiments show that the algorithm performs well, also on a Wikipedia vandal identification dataset. The idea is to use it to find any type of bad user, including sockpuppets. I will send you some slides tomorrow so that you get a high level overview.
The plan is get the basic framework of the algorithm done before Tilen leaves, which is in late Sept, and then tune it specifically for Wikipedia after that."

He has also asked if the tool will work on private data (I communicated that he should assume that's the case) and whether we can learn the details of the current process how sockpuppets are detected. I created a task for documentation on meta T172796 and figuring out procedures T172795.

leila edited projects, added Research-2017-18-Q2; removed Epic.Sep 5 2017, 9:27 PM
leila updated the task description. (Show Details)Sep 5 2017, 9:29 PM

Not the same thing, but one may want to be aware of T139810: RFC: Overhaul the CheckUser extension too.

DarTar added a subscriber: srijan.Sep 21 2017, 5:17 PM
Stryn added a subscriber: Stryn.Oct 17 2017, 6:30 PM
Huji added a subscriber: Huji.Jan 5 2018, 11:59 PM
leila reassigned this task from leila to srijan.Jan 6 2018, 6:59 PM

@srijan Happy 2018! :)

I'm assigning this task to you as you're in charge of it. :) On our end, Dario will remain the point of contact. If you need my help at any point, just ping.

srijan added a comment.Jan 7 2018, 6:27 AM

@leila Happy new year to you too!
Definitely, thanks!

SPoore added a subscriber: SPoore.Jan 10 2018, 7:08 PM
Restricted Application added a subscriber: MGChecker. · View Herald TranscriptMar 9 2018, 5:10 PM
leila moved this task from Staged to In Progress on the Research board.Mar 9 2018, 5:12 PM
leila added a comment.Mar 9 2018, 5:22 PM

Update (No action needed):

Srijan and I met today (meeting notes) and we discussed the state of this task. The task is on a very good track given the complexity of it. Detecting sockpuppets is not an easy task. In the past months, the researchers have tried 3 models (A, B, and C under Model 1) and managed to bring the AUC from almost random (~0.5) to 0.72. Right now, they're working on Model 2. The biggest challenge at the moment is to improve the speed of Model 2 for Wikipedia (because the model relies on every single edit, quite some work is needed to speed it up). Given the state of the model and the work left, the current estimate is that we'd be able to test the new model (hopefully with much higher AUC) in May or June. This date may need an update if the results come out earlier or later. Let's not fix those dates as final in our operations, but that's the target.

Thank you for the update, @leila ! Excited to see the results of the new model.

Let me, @SPoore and/or @PEarleyWMF know if you need help getting feedback from CheckUsers or other sockpuppet hunters. As a reminder, the Anti-Harassment has committed to building a simple UI to help our users interface with the model, if needed.

leila added a comment.Mar 9 2018, 6:54 PM

@TBolliger thanks! How do you recommend we set aside time for your team to help? On the one hand, being able to test via the simple UI you refer to in May/June is very plausible from the state of research, on the other hand, it's not 100% clear that the research will be ready. Is there some way you can set aside some time on your team's end for this without locking resources completely until we know more?

I've created T189324: Build UI to validate sockpuppet model with users to track this work.

Our team works on 2-week sprints, so this task can interrupt us at any point. If the model is determined ineffective we can close this ticket as declined. But (more likely and hopefully!) when the model is ready we can set up a call or email to discuss further details (implementation & what we actually want users to validate.)

leila added a comment.Mar 9 2018, 7:24 PM

@TBolliger this is great! Thank you!

@leila

I also wanted to mention that I've included this work in our Q4 goals: https://www.mediawiki.org/wiki/Wikimedia_Audiences/2017-18_Q4_Goals

Do you think Q4 is a reasonable timeframe? Or should it be Q1?

@TBolliger It's hard to commit to Q4 from the research perspective as we may not be able to make it. It really depends on how the research goes (and I know this is deeply uncertain:/). You can call it out in Q4 goals as a stretch goal, or leave it out for Q1, and if you get to do it in Q4, we can still report it. Does this work for your workflow?

OK, we'll drop it for Q4. If things get way ahead of schedule we can still work on it :)

Sounds good, @TBolliger . :)

TBolliger closed this task as Declined.Jan 30 2019, 11:00 PM
leila reopened this task as Open.EditedMar 28 2019, 8:41 PM

Update time. I have sent the following status update and recommendation for next steps email to a few folks. Putting it here as well for visibility for others.

Summary: we have a feature-based model based on public edit logs that
can predict whether two usernames are the same with performance of
~65%. We will talk with checkusers through Trust and Safety to see how
we can move forward in implementation and also see if they're
interested for the model to include features based on private data
which can enhance model performance.

Longer version:

  • The focus of the work has been on working with public edit data only. Access to private data can most probably improve the results as, for example, we will be able to tell whether the IP address or userAgent of two accounts are the same. We intentionally decided to start with public data only.
  • Using only public edit data: we have two models. A simple feature-based model that can be easily scaled (~65% performance), and a deep learning model which is much more resource intensive (~73% performance). The feature based model is not great in terms of performance but is better than random (50% performance) and it can be a good starting point as it's simple and can be scaled. We expect that adding private data to that (if checkusers are interested), enhances performance significantly.
  • What we're currently predicting is the probability that two usernames are the same. We can extend the model to: for a given username x, give me a ranked list of all usernames that are predicted to be the same as username x (with some condition to make the search space for pairs smaller, otherwise you have x and millions of usernames to check against and compute probabilities for which is not scalable).

Recommendation: we work with checkusers to implement the feature based model for them considering their workflows. We then add the private data to the model as a set of features if checkusers are interested.

I'll follow up with Trust and Safety about this now as I will need to talk about this with checkusers to see what they think is the best way to move forward.