Page MenuHomePhabricator

Sockpuppet detection API
Open, MediumPublic1 Estimated Story Points

Description

The Anti-Harassment Tools team is going to be working on a tool to assist users in finding sockpuppets on Wikimedia projects. It is still quite early in the planning phase.
The planned feature ideas are something like below:

  1. User enters two usernames/IPs and the model returns a similarity score for the users with information about which features were a high match
  2. User enters a single username/IP and the model returns other users with a high similarity score and information about which features were a high match
  3. User enters a sequence of usernames/IPs and the model returns groups of which users seem to be similar and information about which features were a high match

This feature-set will likely be built into a new special page in mediawiki or be mae a part of the checkuser extension.
The work for building the machine learning model is being done in T236299: Port sock-puppet detection model in-house. Once that work is complete, we will need an API for accessing this data via a MediaWiki interface. The API will need to be restricted to a specific user group (likely checkusers).

Use machine learning to detect multiple accounts controlled by the same person used for sockpuppetry

https://www.mediawiki.org/wiki/Core_Platform_Team/Initiative/Sockpuppet_Detection_API/Initiative_Vision

Linked tasks:

Event Timeline

Naike created this task.Aug 3 2020, 9:41 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2020, 9:41 AM
kaldari renamed this task from Sockpuppet detection model productized to Sockpuppet detection API.Aug 4 2020, 4:25 PM
kaldari added a project: Anti-Harassment.
kaldari updated the task description. (Show Details)
kaldari updated the task description. (Show Details)Aug 4 2020, 4:31 PM
leila added a subscriber: leila.Aug 5 2020, 6:51 PM
Niharika triaged this task as Medium priority.Aug 7 2020, 6:32 PM
Niharika updated the task description. (Show Details)
Niharika added a subscriber: kaldari.
leila added a subscriber: Niharika.Aug 7 2020, 6:48 PM

@Niharika thanks for updating this task's description. As a follow up to our last meeting: I suggest explicitly opening a task for evaluation of the current output of the sockpuppet detection model (both in terms of quality and the type of output) by checkusers. My concern is that we can finish the code handover in T236299 and the start building the API captured in this task and at the end when the model is made available to checkusers they may come with major feedback that requires redoing a lot of work. Before we spend time on the API, (the collective) we spend time on the API, I'd love to see their feedback. (sorry if that work is captured somewhere and I missed it.)

leila added a subscriber: Isaac.Aug 7 2020, 6:49 PM

@Niharika thanks for updating this task's description. As a follow up to our last meeting: I suggest explicitly opening a task for evaluation of the current output of the sockpuppet detection model (both in terms of quality and the type of output) by checkusers. My concern is that we can finish the code handover in T236299 and the start building the API captured in this task and at the end when the model is made available to checkusers they may come with major feedback that requires redoing a lot of work. Before we spend time on the API, (the collective) we spend time on the API, I'd love to see their feedback. (sorry if that work is captured somewhere and I missed it.)

Good point. I haven't captured that work in a task yet. I will do so. It will be a 2-3 weeks, if not more before we can have that feedback from checkusers, mainly because there are some more pressing things we are engaging them in at the moment, which is taking up a bit of their time.
Ahead of initiating their feedback on the usefulness of the features being used, I would appreciate it if we can document, on a high level, an overview of the model and the features being used. This would be extremely helpful in getting the checkusers on the same page when facilitating feedback discussions. I also mentioned this to Isaac and Djellel when I met them previously.

leila added a subscriber: DED.Aug 19 2020, 3:45 PM

@Niharika thanks for updating this task's description. As a follow up to our last meeting: I suggest explicitly opening a task for evaluation of the current output of the sockpuppet detection model (both in terms of quality and the type of output) by checkusers. My concern is that we can finish the code handover in T236299 and the start building the API captured in this task and at the end when the model is made available to checkusers they may come with major feedback that requires redoing a lot of work. Before we spend time on the API, (the collective) we spend time on the API, I'd love to see their feedback. (sorry if that work is captured somewhere and I missed it.)

Good point. I haven't captured that work in a task yet. I will do so. It will be a 2-3 weeks, if not more before we can have that feedback from checkusers, mainly because there are some more pressing things we are engaging them in at the moment, which is taking up a bit of their time.

2-3 weeks is completely fine. If you can make it clearer when we should expect to hear back from them (at least the worst case scenario), that can be helpful for our plannings.

Ahead of initiating their feedback on the usefulness of the features being used, I would appreciate it if we can document, on a high level, an overview of the model and the features being used. This would be extremely helpful in getting the checkusers on the same page when facilitating feedback discussions. I also mentioned this to Isaac and Djellel when I met them previously.

@DED and @Isaac : is the above something you are planning to do in the coming weeks?

@Niharika I do recommend engaging DED and Isaac to work with you to put together a list of questions to ask while we're waiting for checkusers' time to open.

Ahead of initiating their feedback on the usefulness of the features being used, I would appreciate it if we can document, on a high level, an overview of the model and the features being used. This would be extremely helpful in getting the checkusers on the same page when facilitating feedback discussions. I also mentioned this to Isaac and Djellel when I met them previously.

@DED and @Isaac : is the above something you are planning to do in the coming weeks?

Yeah, last meeting I believe we discussed setting up a Mediawiki page with documentation. DED will be the best person for this right now but I can try to get it started. A few questions though for @Niharika:

  • Do you have a specific MediaWiki page in mind or somewhere else you'd want us to start?
  • Do you have an example of what you're thinking format-wise? If not, that's fine, but if there's a format you or checkusers would be familiar with, it'd be best to try to match that.

Yeah, last meeting I believe we discussed setting up a Mediawiki page with documentation. DED will be the best person for this right now but I can try to get it started. A few questions though for @Niharika:

  • Do you have a specific MediaWiki page in mind or somewhere else you'd want us to start?

I think that's a question for y'all. Does not matter a whole lot from my perspective. Probably a page under research namespace would be best, given that the model is being developed by Research. I don't know if we should amend the existing page about this project that Srijan started.
Once we have it up, we can link it from the project page for this project, once that goes up.

  • Do you have an example of what you're thinking format-wise? If not, that's fine, but if there's a format you or checkusers would be familiar with, it'd be best to try to match that.

I'm thinking this would be a quite generic page describing the model and its internals, written in a language easy to understand for everyone. I don't have a format in mind except having an overview and a section describing the features being used in the model would be helpful. It will be helpful for my team as well, not just checkusers.

leila added a comment.Aug 20 2020, 4:16 AM

@Isaac please update the page Srijan started.

@Niharika are you sure you want the model and features be documented publicly? Fine on our end if you want that, however, I can imagine in this case a private document that can be shared with checkusers can work as a fine start, too.

Naike set the point value for this task to 1.Sep 11 2020, 10:48 AM

@Isaac please update the page Srijan started.

@Niharika are you sure you want the model and features be documented publicly? Fine on our end if you want that, however, I can imagine in this case a private document that can be shared with checkusers can work as a fine start, too.

I missed this, sorry. I think it's good to start with a private document like Isaac did.

eprodromou updated the task description. (Show Details)Thu, Oct 8, 6:35 PM