Page MenuHomePhabricator

[Open Question] Automatically detecting accounts that do paid editing activity
Open, NormalPublic

Description

Investigate whether reliable prediction models can be built to detect accounts with paid editing activity. From some offline discussions, we know:

Current detection steps
The following steps are taken by users (in group XX) to identify account that do paid editing:

How prevalent are the paid edit accounts?
Let's define paid edit accounts (for now loosely) as those account who have done at least one edit activity associated with paid editing since their creation. Do we have a sense how prevalent such accounts are?

Event Timeline

leila created this task.Feb 18 2017, 5:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 18 2017, 5:42 PM

The patterns that can at least be fairly easily recognised by a clueful New Page Reviewers could indeed be recognised by a bot. Managing the matching of similar evidence across multiple pages/multiple accounts is generally beyond the scope of the average patroller, but a bot using some form of semantic searching/syntax recognition could probably do it.

Thanks, @Kudpung, for chiming in. It would be helpful to list those recognizable clues as soon as we have a research page up.

@srijan are you interested to have a chat about this project? I had a chat with @Cervisiarius and we think it's very much aligned with your expertise and interests. (For others reading this comment, Srijan has a strong research interest and background in using machine learning to identify malicious users and activities on the web. Last year, for example, he did research on hoax detection on English Wikipedia.)

@leila @Cervisiarius this sounds super interesting! It would be nice to discuss more.

leila updated the task description. (Show Details)Mar 7 2017, 6:33 PM
leila added a comment.Mar 7 2017, 6:38 PM

@Kudpung @Doc_James @Jytdog Can you help with expanding "Current detection steps" and "How prevalent are the paid edit accounts?" sections under Description? If the answer to either of the sections is not known, feel free to say we don't know or provide best estimates (estimates in specific fields are also welcome, especially for the second question). I'm adding a couple of more sections over time, feel free to add content to them as well. Thank you! :)

On a separate note, both Srijan and Cevisiarius are excited to help on the research front. I'm happy that we have enough people with different expertise on board to look into this further. :)

leila edited projects, added Research; removed Research-Backlog.Mar 7 2017, 6:40 PM

With respect to how prevalent undisclosed paid editing is, that is an excellent research question. How does one accurately measure an activity that those carrying out are trying to keep secret? Expecially in an environment where even research of the topic is looked at negatively by many members of arbcom.

We have maybe 50 individuals listed here involved in paid editing https://www.upwork.com/o/profiles/browse/?q=Wikipedia
We have a number of companies involved in paid editing here https://en.wikipedia.org/wiki/User:Doc_James/Paid_Editing_Companies of which I will expand
We have a huge list of concerns here https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest/Noticeboard

I guess one could take a random selection of articles within a specific topic area and analyse them. I would estimate 20% of corporations and articles on living people are paid for. But that is just a ballpark figure. The volunteer community may addresses half of concerns.

Liridon added a subscriber: Liridon.Aug 2 2017, 6:05 PM
leila added a comment.Oct 24 2017, 8:50 AM

For when we come back to this open question: Article Wizard is now redesigned to help editors disclose COI and paid editing. (Check this comment for more details.) The link to the page that helps the editor navigate through reporting COI or paid editing is here.

The signs of paid editing:

A document is currently being drafted which will serve as a tutorial for New Page Reviewers. Almost complete, it probably contains all that is needed to feed an AI system.

See: Identifying PR