Page MenuHomePhabricator

[SPIKE] Investigate how approach Santhosh proposed could enable us to detect presence/absence of policy violations in new content edits
Open, MediumPublic

Description

Prompted by the Wikipedia Edit Review Experiment @calbon conducted, @santhosh generated an LLM prompt to:

  • Systematically evaluate proposed edits to Wikipedia articles
  • Identify potential violations of Wikipedia content policies
  • Provide objective, concise assessments of edit changes

The outcome of this investigation seems promising in so far as Santhosh is demonstrating that an LLM can identify the specific policy violations (with en.wiki links) content edits introduce:

image.png (667×1 px, 87 KB)

Among other things, the above is leading the Editing Team to immediately wonder: Might the approach Santhosh piloted be a reliable and scalable way to detect presence/absence of policy violations in new content edits?

The Editing Team asks this question most immediately curious to learn if this approach could enable us to evaluate the impacts of Tone (T365301) and Paste Check (T359107).

Requirements

  • Review the approach Santhosh piloted and document the extent to which we think it could be effective at reliably detecting the presence of copyright and non-netural tone within new content edits.

Event Timeline

Thanks for creating a ticket for further investigation on the approach.

In my POC, I did some simplifications and I would consider removing that simplification and bring that close to a real world edit situation. And then evaluate it to get a better sense

  1. I hand crafted the before and after texts. They are plain text content. However, wikitext might be the actual format we want to use.
    • Plain text version of the edit difference might miss vandalism in non-renderable parts. For example, adding a link to exernal website, while the link text is non-problematic text
  2. In real edit diff situation, how effectively we can construct these 'before' and 'after' content
    • Need to evaluate how effectively LLM can 'summarize' the diff when the diff is compllicated - as in table edits, template edits, markup corruptions, Also long range diffs when small edits are done in various parts of a large article

I believe these two improvements require more work on data preparation. We might also need to expand the POC to allow us quickly check the edits. For example, let it accept a revision id and then evaluate the edit. Standalone(or sandboxed) evaluations like that can help us in quick iterations.

Known limitations: Any LLM approach is subjected to limited language coverage. But that should not prevent us from using its potential in the languages it supports.

MNeisler triaged this task as Medium priority.Dec 19 2024, 7:05 PM