Page MenuHomePhabricator

Detection and flagging of articles that are AI/LLM-generated
Open, Needs TriagePublicFeature

Description

LLM is Large Language Model, the fancy word for ChatGPT-like AI's where you can ask it a question or give it a topic, and it will output a fluent-sounding but factually incorrect answer.

Newer users who don't know our rules have been creating articles using AI. One AFC reviewer reports having seen 60 drafts that were LLM-generated.

There are open source detectors for LLM that are very accurate They'll give a probability, and many of the matches are above 99%. We should look into the feasibility of adding a feature to PageTriage that tags articles with LLM-generated text.

Technically, there's several approaches:

  • We could create something similar to the pagetriagetagcopyvio API
  • We could find an open source one and incorporate its code, assuming it is not resource-intensive, i.e. requiring a farm of AIs or something.
  • We could create a third party tool similar to Earwig's copyvio detector

Credit to S0091 for the idea.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I like the idea, but suspect this will be a pretty large undertaking.

There are open source detectors for LLM that are very accurate They'll give a probability, and many of the matches are above 99%. W

Could you list a few? OpenAI's classifier for detecting AI-written text has a 26% true positive rate.

Could you list a few? OpenAI's classifier for detecting AI-written text has a 26% true positive rate.

Sure. The following page appears to be a good summary.

https://en.wikipedia.org/wiki/Wikipedia:Using_neural_network_language_models_on_Wikipedia#Countermeasures

Algorithms [ edit source]
In a demo by Hugging Face at [1] (based on RoBERTa), even with a heavily edited paragraph (such as those in § Copyediting paragraphs), the detector can recognize AI text and real text with extremely high confidence (>99%); make sure to remove the reference notes "[1], [2]" beforehand. Such a model can be extremely useful for ORES, a MediaWiki machine learning API primarily used to detect vandalism in Special:RecentChanges. Over time however, these models will have a harder time finding "abnormalities" as AI text generation becomes more sophisticated.

Websites offering detection services ¶[ edit source]
https://gptzero.me/
https://www.zerogpt.com/
https://openai-openai-detector.hf.space/
https://detector.dng.ai/
https://contentatscale.ai/ai-content-detector/
https://corrector.app/ai-content-detector/
https://writer.com/ai-content-detector/
https://etedward-gptzero-main-zqgfwb.streamlit.app/

https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Articles_for_creation#ChatGPT_and_other_AI_generated_drafts has some additional discussion. @Qwerfjkl offers to write a tool or bot to run this on new drafts, and it is mentioned that GPTZero has an API. See bottom of https://gptzero.me/

Tagging Machine-Learning-Team for awareness, or maybe they have something like this on their roadmap?

Tgr added subscribers: HaeB, Tgr.

FWIW that just got updated to say the opposite (by @HaeB, whose opinion I'd trust on this topic). My vague recollection from examples I have seen is also that these tools aren't really reliable (and while ChatGPT has been RLHF-ed into a recognizable style and it takes some effort at least to make it abandon that, other LLMs are all around the place).