In this task, we want to find answers to questions like the following:
- Given a set of blocked users on a wiki (e.g. enwiki in the month of July 2024), using publicly available data, are there patterns that these accounts have in common?
- Can we further segment these users based on block reason (e.g. vandalism, spam, long term abuse, sock puppetry, etc) and see different common patterns in public data points known about these accounts?
- Do the distributions change if we look at permanently blocked versus temporarily blocked accounts?
- Hypothetical example:
- We might find that of permanently blocked accounts in July 2024, a high percentage do not have a confirmed email address and a high percentage received no Thanks from any users, and the users have a high ratio of reverted to unreverted edits.
- Given a set of data points that are associated with blocked accounts (e.g. no confirmed email, no Thanks, high ratio of reverts to unreverted edits), can we say anything about the relative importance of each one of those data points, to know how to weight them in calculating an overall reputation score for an account?
- Given a data point, does its importance change if we look at it over time or in combination with other data points, e.g. number of Thanks received doesn't indicate anything on its own, but combined with age of account and number of edits, it is a strong signal about an account activity in good or bad faith
- Hypothetical example:
- We might find that having < N number of Thanks for accounts with more than X edits and older than Y days is associated with all permanently blocked accounts in the dataset, and based on this, we would say that number of Thanks should be weighted by Z% when calculating an overall reputation score
Background
- We are interested to be able to provide an account reputation score to present to functionaries/administrators/moderators on the wikis to assist in anti-abuse work.
- The project overview page is located here
- The project overview contains a table with data points we have identified as being of interest. There may be other data points to look at; this list is not exhaustive.
- The primary goal of the reputation score is to make it easier for humans to quickly identify accounts that may need further review by surfacing a score based on various data points known about the account.
- A secondary, longer-term goal is to make this score available to software (like AbuseFilter) where it could be used for mitigations
- We are primarily interested in accounts that are < 90 days old, in the assumption that accounts with persistent bad faith activity older than that will have already been blocked.
- The algorithm to work out a reputation score needs to be "good enough" and not perfect. We should be able to help patrollers and functionaries identify accounts that are likely problematic while their edits are being reviewed, and help decrease the amount of time that patrollers/functionaries need to spend manually building a mental model of an account's reputation
- The reputation score should be transparent in that those with access to it should be able to understand how various components contributed to an overall score
- Looking at the score in three tiers ("low risk", "neutral", "high risk"), we are most interested in the "high risk" category
Related tasks
- T371876: Make a labelled dataset for analysing account reputation score which is about building a labelled dataset. This is somewhat related to the proposal here to pick a set of blocked users, so whatever we end up doing in this task, would be good to share the outcomes in T371876 to inform that work.
- T371880: Make a pipeline for optimising account reputation score which talks about establish a pipeline based on the dataset. Whatever work is done in this task, will likely help inform later work that might be done in T371880