Prototype new models to facilitate sockpuppet detection
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	leila
	Jul 25 2017, 6:14 PM

Description

Prototype new models to facilitate sockpuppet detection.

This task is scheduled to start in Q2. The preparation for it starts in Q1.

Related Objects
Search...

Status	Assigned	Task
Resolved	• DarTar	T171251 [Objective 3.1.2] Models for sockpuppet and toxic discussion detection
Resolved	Isaac	T171635 Prototype new models to facilitate sockpuppet detection
Declined	leila	T172795 Document the procedure and workflow for sockpuppet detection
Resolved	srijan	T172796 Create a research page for this project in meta
		Unknown Object (Task)
Declined	leila	T175103 Create a sketch of how sockpuppets are detected now
Resolved	• DarTar	T176379 Conduct a small-scale checkusers survey
Resolved	DED	T236299 Port sock-puppet detection model in-house

Event Timeline

leila claimed this task.Jul 25 2017, 6:14 PM

leila created this task.

Requested a list of attendees and meeting for syncing up and aligning directions/expectations between Research, Scoring Platform, and Community Tech, prior to reaching out to potential external collaborators.

leila edited projects, added Research, Epic; removed Research-Programs.Jul 25 2017, 6:33 PM

• DarTar mentioned this in T171251: [Objective 3.1.2] Models for sockpuppet and toxic discussion detection.Jul 25 2017, 6:34 PM

Halfak added a project: artificial-intelligence.Jul 28 2017, 9:08 PM

Notes from our exploratory call with Srijan and multiple WMF teams on August 4: https://etherpad.wikimedia.org/p/sockpuppetdetection

Summary of results from the meeting: there's generally support for this research. Next steps:

Srijan and I will follow up to plan for the start of the research in Q2 (September-December). I'm not sure if this is possible on Srijan's end but we will figure it out in the coming weeks.
We will start by understanding the current workflow for detecting sockpuppet accounts.

leila created subtask T172795: Document the procedure and workflow for sockpuppet detection.Aug 8 2017, 3:46 PM

leila created subtask T172796: Create a research page for this project in meta.

Srijan says:
"I am already working with Tilen, a visiting PhD student (just like I once myself was :)), on an algorithm to identify bad users in any platform, including Wikipedia. Initial experiments show that the algorithm performs well, also on a Wikipedia vandal identification dataset. The idea is to use it to find any type of bad user, including sockpuppets. I will send you some slides tomorrow so that you get a high level overview.
The plan is get the basic framework of the algorithm done before Tilen leaves, which is in late Sept, and then tune it specifically for Wikipedia after that."

He has also asked if the tool will work on private data (I communicated that he should assume that's the case) and whether we can learn the details of the current process how sockpuppets are detected. I created a task for documentation on meta T172796 and figuring out procedures T172795.

leila edited projects, added Research-2017-18-Q2; removed Epic.Sep 5 2017, 9:27 PM

leila updated the task description. (Show Details)Sep 5 2017, 9:29 PM

leila created subtask T175103: Create a sketch of how sockpuppets are detected now.Sep 6 2017, 12:33 AM

• TBolliger added a project: Anti-Harassment.Sep 18 2017, 3:48 PM

• DarTar created subtask T176379: Conduct a small-scale checkusers survey.Sep 21 2017, 12:27 AM

• Tbayer subscribed.Sep 21 2017, 1:18 AM

Not the same thing, but one may want to be aware of T139810: RFC: Overhaul the CheckUser extension too.

• DarTar added a subscriber: srijan.Sep 21 2017, 5:17 PM

MusikAnimal subscribed.Oct 3 2017, 6:32 PM

• Nikerabbit subscribed.Oct 5 2017, 7:12 AM

Stryn subscribed.Oct 17 2017, 6:30 PM

Ajraddatz closed subtask T176379: Conduct a small-scale checkusers survey as Resolved.Jan 5 2018, 10:38 PM

Huji subscribed.Jan 5 2018, 11:59 PM

@srijan Happy 2018! :)

I'm assigning this task to you as you're in charge of it. :) On our end, Dario will remain the point of contact. If you need my help at any point, just ping.

@leila Happy new year to you too!
Definitely, thanks!

• SPoore subscribed.Jan 10 2018, 7:08 PM

leila closed subtask T172796: Create a research page for this project in meta as Resolved.Mar 9 2018, 5:10 PM

Restricted Application added a subscriber: MGChecker. · View Herald TranscriptMar 9 2018, 5:10 PM

leila moved this task from Backlog to In Progress on the Research board.Mar 9 2018, 5:12 PM

Update (No action needed):

Srijan and I met today (meeting notes) and we discussed the state of this task. The task is on a very good track given the complexity of it. Detecting sockpuppets is not an easy task. In the past months, the researchers have tried 3 models (A, B, and C under Model 1) and managed to bring the AUC from almost random (~0.5) to 0.72. Right now, they're working on Model 2. The biggest challenge at the moment is to improve the speed of Model 2 for Wikipedia (because the model relies on every single edit, quite some work is needed to speed it up). Given the state of the model and the work left, the current estimate is that we'd be able to test the new model (hopefully with much higher AUC) in May or June. This date may need an update if the results come out earlier or later. Let's not fix those dates as final in our operations, but that's the target.

leila added a project: Research-2017-18-Q3.Mar 9 2018, 5:23 PM

Thank you for the update, @leila ! Excited to see the results of the new model.

Let me, @SPoore and/or @PEarleyWMF know if you need help getting feedback from CheckUsers or other sockpuppet hunters. As a reminder, the Anti-Harassment has committed to building a simple UI to help our users interface with the model, if needed.

@TBolliger thanks! How do you recommend we set aside time for your team to help? On the one hand, being able to test via the simple UI you refer to in May/June is very plausible from the state of research, on the other hand, it's not 100% clear that the research will be ready. Is there some way you can set aside some time on your team's end for this without locking resources completely until we know more?

I've created T189324: Build UI to validate sockpuppet model with users to track this work.

Our team works on 2-week sprints, so this task can interrupt us at any point. If the model is determined ineffective we can close this ticket as declined. But (more likely and hopefully!) when the model is ready we can set up a call or email to discuss further details (implementation & what we actually want users to validate.)

@TBolliger this is great! Thank you!

@leila

I also wanted to mention that I've included this work in our Q4 goals: https://www.mediawiki.org/wiki/Wikimedia_Audiences/2017-18_Q4_Goals

Do you think Q4 is a reasonable timeframe? Or should it be Q1?

@TBolliger It's hard to commit to Q4 from the research perspective as we may not be able to make it. It really depends on how the research goes (and I know this is deeply uncertain:/). You can call it out in Q4 goals as a stretch goal, or leave it out for Q1, and if you get to do it in Q4, we can still report it. Does this work for your workflow?

OK, we'll drop it for Q4. If things get way ahead of schedule we can still work on it :)

Sounds good, @TBolliger . :)

• TBolliger moved this task from Untriaged to Tracking work by others on the Anti-Harassment board.Apr 13 2018, 6:08 PM

leila closed subtask T175103: Create a sketch of how sockpuppets are detected now as Declined.Jun 12 2018, 6:22 PM

• TBolliger closed this task as Declined.Jan 30 2019, 11:00 PM

leila closed subtask T172795: Document the procedure and workflow for sockpuppet detection as Declined.Feb 15 2019, 11:45 PM

Update time. I have sent the following status update and recommendation for next steps email to a few folks. Putting it here as well for visibility for others.

Summary: we have a feature-based model based on public edit logs that
can predict whether two usernames are the same with performance of
~65%. We will talk with checkusers through Trust and Safety to see how
we can move forward in implementation and also see if they're
interested for the model to include features based on private data
which can enhance model performance.

Longer version:

The focus of the work has been on working with public edit data only. Access to private data can most probably improve the results as, for example, we will be able to tell whether the IP address or userAgent of two accounts are the same. We intentionally decided to start with public data only.

Using only public edit data: we have two models. A simple feature-based model that can be easily scaled (~65% performance), and a deep learning model which is much more resource intensive (~73% performance). The feature based model is not great in terms of performance but is better than random (50% performance) and it can be a good starting point as it's simple and can be scaled. We expect that adding private data to that (if checkusers are interested), enhances performance significantly.

What we're currently predicting is the probability that two usernames are the same. We can extend the model to: for a given username x, give me a ranked list of all usernames that are predicted to be the same as username x (with some condition to make the search space for pairs smaller, otherwise you have x and millions of usernames to check against and compute probabilities for which is not scalable).

Recommendation: we work with checkusers to implement the feature based model for them considering their workflows. We then add the private data to the model as a set of features if checkusers are interested.

I'll follow up with Trust and Safety about this now as I will need to talk about this with checkusers to see what they think is the best way to move forward.

Status update:

We presented the current state of the two models in Wikimania 2019.
We prepared an email to checkusers with more info for testing and feedback that PEarley with share with them in a couple of hours.

I expect the following couple of weeks be spent on gathering their feedback and understanding how/if iterations over the model are needed.

We have received the first feedback from a checkuser and we will need to change one thing in the set of predictions. At the moment, we include all accounts, including those who have not edited in the past 90 days, but this information is not actionable for checkusers as they can't use other (sometimes private) information that are kept and can be used for 90 days in these cases. It makes sense to remove these predictions, or give two outputs and let them filter by less-than-90-day-edit-activity or not.

We will create a new list and share with them.

I reassigned the task to DED. We're still working with srijan on this. The reassignment makes it clearer on our end who we can poke for status updates.

• Demian subscribed.Dec 10 2019, 12:44 AM

@leila Is this task still relevant?

@Niharika I'll assign it to Isaac and he can decide how to merge/update.

leila reassigned this task from DED to Isaac.Sep 30 2020, 8:00 PM

leila edited subscribers, added: DED; removed: • TBolliger.

leila closed subtask T236299: Port sock-puppet detection model in-house as Resolved.Oct 2 2020, 11:08 PM

Weekly updates:

Prepared model and internal API based on co-edit history to test out with checkusers
Email was sent to checkusers at the start of the week notifying them that they could test out the model but so far no requests
Meta page updated: https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects
Some descriptive statistics generated on what ties together sockpuppet accounts: https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects#Descriptive_Analyses
Working on reworking text diff pipeline in PySpark and to also indicate which sections were edited by which users

Isaac moved this task from In Progress to FY2020-21-Research-October-December on the Research board.Oct 15 2020, 7:55 PM

Isaac edited projects, added Research (FY2020-21-Research-October-December); removed Research.

Weekly updates:

No feedback so far on tool -- looking into ways to reduce barriers to testing with checkusers
Started due diligence on making tool code public -- reached out to NK, PE, AS, LZ

Weekly update:

Still no feedback
Feedback collected from AS about making code public but was requested by PE to give several more days for discussion before making a decision

Weekly update:

Still no feedback
Waiting on decision around making code public. Will follow up next week.
Meeting with NK/EP to discuss productization in the meantime and we seem to have good agreement there.

Blablubbs subscribed.Nov 9 2020, 1:27 PM

Weekly update:

Still no feedback from Checkusers -- at this point, I believe the expectation is that we will productize it so they can access the tool directly, which should make it much easier for them to provide feedback.
Tool code has been moved to Gerrit: https://gerrit.wikimedia.org/r/admin/repos/mediawiki/services/similar-users
I'm largely just playing a consultation role right now but really excellent progress on productization of the tool as being tracked here: T265722
I produced datasets of text diffs and which sections were edited by each user to explore with DD

@Isaac thank you for the update. When we meet next time to talk about the technical items, let's make sure we discuss roadmaps for this line of research and model (something you and I touched on a month ago and we thought in December there will be more clarity to act based on).

Weekly updates:

Continued support of productization
Regenerated data through all of November -- whole pipeline was about 20 minutes start to finish from collecting all the relevant edit history from the cluster to outputting the TSV files the tool uses.

When we meet next time to talk about the technical items, let's make sure we discuss roadmaps for this line of research and model

Sounds good - I will add to the agenda

I'm going to close this task out unless there are any objections -- my work on this has largely been complete for a while now and no issues have come up yet in the productization that would require serious rework of the approach (though plenty of improvements have been made to the stability of the prototype). Future tasks that we might open are:

Making updates based on Checkuser feedback
Further research into other types of data / modeling that could help inform the ranking.

Summary:

Documentation for project: https://meta.wikimedia.org/wiki/Research:Sockpuppet_detection_in_Wikimedia_projects
Tool code (under development): https://gerrit.wikimedia.org/g/mediawiki/services/similar-users/+/refs/heads/main

Isaac closed this task as Resolved.Dec 23 2020, 7:48 PM

Cabayi subscribed.Jun 15 2021, 9:57 AM

Prototype new models to facilitate sockpuppet detectionClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Prototype new models to facilitate sockpuppet detection
Closed, ResolvedPublic
Actions

Related Objects
Search...