(Please set yourself as task assignee of this session)
- Title of session: Anti abuse work on wikis
- Session description: A workshop on anti abuse work on WMF wikis, giving detail on the current systems, recent work, and new tools being created.
- Username for contact: @Dreamy_Jazz
- Session duration (25 or 50 min): 50 mins
- Session type (presentation, workshop, discussion, etc.): Workshop
- Language of session (English, Arabic, etc.): English
- Prerequisites (some Python, etc.): None
- Any other details to share?:
- Interested? Add your username below:
- @Rina.sl
Notes from session:
Anti abuse work on the wikis
Date and time:
Relevant links
- Phabricator task: https://phabricator.wikimedia.org/T359343
- Session slides: https://docs.google.com/presentation/d/10dOnsGbF2C3InBUpbp9lCPRSv23XWA9Etd9DhFYrlJk/edit?usp=sharing
Survey - https://wikimedia.qualtrics.com/jfe/form/SV_emruGo866DIuxa6
Agenda
Brief overview of existing anti-abuse efforts across various teams at WMF (10 minutes)
[see slides linked above]
Incident Reporting System - can test this on Beta
Exercise: Identify common themes & areas of work
Exercise: Identify weak spots, places for improvement, project ideas
Review as a group
Hack on projects this weekend!
Goals
Know about different areas of work, why they’re happening
Understand how to contribute
Hear your feedback about where we should improve and what we’re doing well
Connect people
Presenter
- William Brown - WBrown (WMF) / Dreamy Jazz
- Kosta Harlan - KHarlan (WMF)
Participants
- Magioladitis
- SocialKnowledge
Notes
Temporary accounts
- protect contributors in some environments
- Q: potential for abuse?
Global User
MediaModeration
..... Blocking Improvements
IP info and IPoid
- look up geolocation and ISP info
- we're hiding IP addresses, but displayed for temp accounts
Incident reporting system
- for users to report harmful content in appropriate place
- currently in MTP stage and likely to change - hopefully to production later this year
Agent Header
- depricated and replaced with user agent client hints
Other things that are being worked on:
SRE - DDOS attack mitigation upgrades to captcha automoderator
automatic reversion of edits deemed to be bad edits or by a machine learning model
abuse filter
support the temp account system
any examples of anti-abuse projects? that we should know about?
- About half of the room work on an anti-abuse tool/product
what's interesting to you about this session?
- Here to learn and listen - team works on these topics
- Wikibase Cloud - relevant as this gets more popular
- Admin on Greek Wikipedia & thesis on ethics/abuse on Wikipedia
- From the Security team
- Steward - cross-wiki anti abuse
- Admin on enwiki, particularly interested in mobile patrolling
- Steward - tried to work on abuse tools like AbuseFilter but it's hard
Big group exercise (5minutes) -- what themes, categories, and problem areas do the projects we mentioned relate to?
Themes/Categories:
- Vandalism
- Scale
- conflict of interest editors
- ad hominem attacks
- spam bots
- cross-wiki abuse
- harrassment prevention / mitigation
- detection
- sock puppetry
- improved mitigations
- collateral damage (esp in particular countries)
- mass creation of articles by a particular user (e.g. with LLMs)
- vandalism - to see how long a particular edit of bad faith was made and then reverted (just for fun/research purposes)
- user privacy
- copyright violations? content moderation?
- Long term abuse by individuals
Small Group Exercise (20 minutes)
discuss what projects, capabilities, critiques, and weaknesses are present in each area we identified in the previous slide.
- Discuss one (or a group of) themes identified above.
- groups: (red) vandalism - spam bots - sock puppetry Content issues
Most obvious issues - anyone can address them
Long running discussion topic - lots of tools. ORES etc.
Machine Learning can support this, but not solve it completely.
Spambots - fighting against technical people who know how to cirumvent the system.
Captcha helps with spambots. Works to some extent but not fully as expected.
For vandalism it can be easy - for example a blacklist of words. Lots of low-hanging fruit here.
Hard part is subtle vandalism - for example the capital of a country is wrong. Have to manually check this is correct. AI can't cross-check information like this. Misinformation/disinformation is harder.
SWViewer for cross-wiki monitoring.
Sometimes abusers use scripts to make rapid abuse. AbuseFilters help but it's not always easy. Can be hard to find people who know how the software works.
Lots of volunteer/third-party tools. Can be hard to know what exists.
Bulgarian Wikipedia has an anti-vandalism bot. Turkish Wikipedia doesn't have one. There should be a centralised tool for this. Especially helpful for smaller Wikipedias.
DDoS - IP reputation. Can this be implemented in anti-vandalism/anti-spam tools to foresee abuse from users?
KH: We're starting an experiment in the next weeks to look at this data to see if it maps to bad content/accounts. Based on that, we might be able to make decisions based on the information. We would like to make this available to other tools. There are some Phab tasks for this, it would just need some work to be done.
AbuseFilter - how would it be helpful? We have an AbuseFilter for new account creating a profile page "I am ...". If we knew it was from a bad reputation IP, we could prevent them from taking the action.
Using AI for abuse detection. Falsified images, for example.
(green) ad hominem - protecting privacy - crosswiki abuse - long term abuse - sockpuppetry Interpersonal issues
- how do you identify bad actors? esp with temp accounts?
- dont want to ban people that aren't doing vandalism
- privacy violations - building fingerprinting solution-- where does that data go? the users documentation can create something and then use that patch to share the data collected (in violation of privacy)
- analytically oriented?
- device fingerprinting solution in particular - to catch device users information -- specific info on the computer that identifing a particlar person
- esp where indivual people use a shared computer
- how different would that be to sharing your same IP address?
- - more devices that humans are in this room right now
- public library - provide access to computers, people will log into their accounts on that computer and could be hacked
- false positives?
- usually not the people that are good people ;)
- identification by some means (but not perfect)
- abuse of wikis - bad behaviour by certain users - how to we find that
- we have two options - deny or block - is there a third option?
- maybe change that to minor things and we allow suspicious editors from doing some things but not all
- this is kind of already there but not easy to use
- need more documentation
- are there large projects that we can work on?
- can we block only certain aspects that make sense to block? to avoid false positives?
- device fingerprinting
- easier to maintain
- abuse filters (lua)
- are there large projects that we can work on at the Hackathon?
- consequences (other than block edit)
- maybe change that to minor things and we allow suspicious editors from doing some things but not all
- what do we need to improve?
- catalog of anti-abuse tools
- have anti abuse histories
- Reliance on individuals to remember things
(purple) scaled abuse - people testing defenses - collateral damage Large scale/deliberate attacks on our projects.
Large project ideas
Anti-Abuse LLM
Early detection
Detecting early is important
Hackathon ideas
New capabilities
Thank ability for temp accounts
Where we need to improve
Make AbuseFilter easier to use, and better documented
Themes/summaries
- AbuseFilter is a common thread - powerful tool
- Improving global abuse filters?
- A tool that could make an AbuseFilter based on a series of edits? Send to an LLM or something.
Survey! Take the survey! (QR code link in presentation)
https://wikimedia.qualtrics.com/jfe/form/SV_emruGo866DIuxa6