Page MenuHomePhabricator

[Request] Identify examples of genAI assisted abuse
Closed, ResolvedPublic

Description

Please provide all the following information:

Context
The Trust and Safety Product team has a KR (WE4.5) that assesses the risks and opportunities of ML/AI on Trust & Safety aspects of Wikimedia’s ecosystem. The recent "Human Rights Impact Assessment of the WMF’s AI/ML Products" highlights the potential marginal human rights risks of generative AI, which includes areas relating to Trust and Safety, such as harassment: “The primary threat is that GenAIs could be used at scale to generate harassing content targeting Wikimedia volunteers, readers and staff.” (p.31)

In Q1, we would like to gain qualitative insights into how bad actors are using generative AI for the purposes of abuse on the wikis, which will enable us to assess the risk to the wikis and generate ideas for mitigating genAI assisted abuse. This research would enable the team to run experiments in mitigating identified risks in Q2.

Description
Run qualitative research to identify wiki examples of abuse from bad actors assisted by generative AI. This might include spam, harassment, long term abusers, undisclosed paid editing, or disinformation campaigns. The outcome can be used to:

  1. Assess the risk of generative AI assisted abuse to our community models.
  2. Generate ideas to mitigate different types of generative AI assisted abuse.

Expected Deliverable

  • This is a qualitative research project, e.g. talking to CheckUsers/functionaries to learn if highly motivated LTAs are using genAI to try to defeat administration sanctions, or incorporating genAI into their tactics.
  • Output could be case studies, such as any examples of LTAs using genAI. Ideally the case studies would include a high-level assessment of the risk to wikis and recommendations of where to focus our mitigation ideas.

Estimated Effort
We can scope this work according to available resources. It could be a 3-4 week project. This is a lower-priority task and could be scoped down to only include descriptions of existing cases where genAI is negatively affecting community health, without a detailed risk assessment.

Priority

I need this task resolved in:

  • 1 month.
  • 3 months (Q1 Jul-Sep)
  • 6 months.
  • Whenever you get to it :-)

Ideally this task would be done in the first half of Q1, to allow us to act on the insights for the remainder of Q1 (i.e. formulate experiments of genAI abuse mitigation ideas).


For use by WMF Research team; please leave everything below as it is:

  1. Does the request serve one of the existing Research team's audiences? If yes, choose the primary audience. (1 of 4)
  2. What is the type of work requested?
  3. What is the impact of responding to this request?
    • Support a technology or policy need of one or more WM projects
    • Advance the understanding of the WM projects.
    • Something else. If you choose this option, please explain briefly the impact below.

Details

Due Date
Sep 29 2025, 5:00 AM

Event Timeline

Assigning to Claudia to discuss further with the team to scope and confirm approach.

Kate Z and Isaac reviewed and confirmed this seems like a project for Design Research.

Isaac does say though: even though it's framed as qualitative research and I think that's reasonable, I have some thoughts about whether it's the right scope / its feasibility (in short, GenAI is hard to detect and the easy-to-identify aspects are the least concerning + GenAI is just one way in which abuse might be scaling up and harder to detect but I think we should be careful to not focus solely on that aspect, which might lead us to miss other concerns that are more important but not GenAI-mediated)

After further discussions with the team and with Isaac, we have settled on these additional points:

  • The primary purpose of this research is to describe already-occurring phenomena rather than trying to (quantitatively) estimate their prevalence
  • We are particularly interested in the ways in which generative AI is negatively impacting the social experience of editors, for instance, accusations of LLM use based on assumed English fluency versus contribution styles

I will also edit the ticket with further details about estimated effort, priority, and the use cases of the research outcomes.

Weekly update: given the prioritization of incoming request tasks this quarter assigned to me, this is likely going to be picked up and started later in Q1 (Aug through Sep).

cwylo set Due Date to Aug 29 2025, 5:00 AM.Jul 9 2025, 9:30 PM
cwylo changed Due Date from Aug 29 2025, 5:00 AM to Sep 29 2025, 5:00 AM.Jul 25 2025, 6:58 PM

We've begun drafting the discussion guide for this project, with a kick-off meeting hopefully scheduled for next week.

DKumar-WMF triaged this task as Medium priority.Sep 2 2025, 3:39 PM

After consulting with the team and broader KR owners, we have decided to close this hypothesis.

  • Discussions with highly active community members engaged in AI cleanup, pilot interviews, and reading through existing examples of genAI use on English Wikipedia indicate that at this moment, genAI does not seem to elevate or enable unique types of trust and safety risks that rise to a systemic issue
    • Instead, the most pressing current issues with genAI seem to be in the realm of content or knowledge integrity (e.g. use of genAI to create bogus or misleading articles or citations), with the potential to overwhelm existing community processes that ensure quality and accuracy in article content
  • We are aware that genAI may still be a T&S risk in the future, so we are not ruling out the possibility of future work on this specific topic
  • This line of research should be continued as part of WE1 as it is still promising, and we want to better understand how genAI changes the dynamics around editing, moderation, and related issues

Instead of the original deliverable, I will instead be creating a writeup to help explain the decision we made for WE 4.5.1, including findings from English Wikipedia that support the first points. This document should be ready for the week of 22 Sep.

Belated close - The final report explaining findings and our decision to close the KR is completed (link for internal use). It contains a summary of findings, a longer writeup, and examples of generative AI misuse on English Wikipedia with annotations and explanations of context.

Key summary

  • Currently, the most common way generative AI is misused on English Wikipedia is through problematic article creation, and AI-written comments.
  • Generative AI does not yet pose a significant trust and safety risk on English Wikipedia.
  • Threats (e.g. legal threats, or threats of violence) or abuse (e.g. harassment, bullying) involving the use of generative AI are handled using existing policies and practices, without needing to develop specific measures aimed at tackling AI-enabled abuse so far.
  • By contrast, editors have developed new policies, new tools such as edit filters, and new guidelines to hide AI-written comments in order to tackle AI risks to content integrity.

In conclusion, generative AI misuse poses a limited trust and safety threat to English Wikipedia for now, but it is possible that AI misuse will enable future abuse on Wikipedia.