Hypothesis
If we develop a task generation engine for the Revise Tone structured task, integrate our recent learnings about which content to include or filter out, and provide pipelines that automatically refresh the task list, we'll enable a qualitative evaluation of the tasks generated and an A/B experiment that tests whether this type of task helps newcomer editors to make more constructive edits.
Scoping details
Use case:
This model will support a new Suggested Edit task that invites contributors—especially newcomers—to improve the neutrality of existing Wikipedia articles by identifying and rewriting biased or promotional language, and peacock language. The intended audience includes users engaging with Suggested Edits via the Newcomer Homepage. The model’s outputs will be surfaced as highlighted sentences or paragraphs within articles, accompanied by calls to action encouraging users to revise them to align with Wikipedia's neutral point of view (NPOV) policy.
This task explores the broader hypothesis that Edit Checks and Suggested Edits can share underlying detection logic. If successful, this approach could improve efficiency, consistency, and scalability across structured editing workflows.
Related tasks:
Model purpose:
The model should analyze article content and detect instances of biased tone or peacock language at a sentence or paragraph level. These detections will inform Suggested Edits, guiding contributors to revise non-neutral phrasing.
Goal:
This project aims to improve article quality by encouraging neutral, policy-aligned contributions. Specific goals include:
- Increasing the number of constructive Suggested Edits
- Reducing the burden on moderators by proactively addressing biased language
- Supporting newcomers in learning and applying Wikipedia’s NPOV guidelines
- Key success metrics include:
- Accuracy of model detections (precision/recall)
- Revert rate and/or qualitative review of resulting edits
- Completion rate of "neutral tone" Suggested Edits
Prior art:
This project builds on that work by adapting UX for existing Suggested Edits and Edit Checks:
Prioritization details
Timing:
We are hoping to run an experiment in November 2025.
KR impact:
FY25/26 WE1.1 KR:
Increasing newcomer constructive activation and retention:
Increase constructive edits [i] by X% for editors with less than 100 cumulative edits, as measured by experiments by the end of Q2.
i. "Constructive edits" = edits that are not reverted within 48 hours of being published
Other comments
Model requirements:
- Detection should be precise enough (sentence or paragraph level) to support actionable user suggestions
- Low false positive rate is essential to maintain user trust and minimize disruption
- Ideally the suggestion queue is built in a way to allow for Community Configuration (e.g., ability for admins to define rules to exclude certain pages, sections, or words) would improve usefulness and community adoption
- The model should be efficient and scalable for use across many articles and languages
- The model should ideally exclude suggestions that target direct quotes, as peacock language or non-neutral tone may be appropriate in these contexts (e.g., when quoting historical texts, public statements, or notable quotations).
Reporting format
Progress update on the hypothesis for the week, including if something has shipped:
Any updates on metrics related to this hypothesis (including baseline, target, or actuals, if applicable):
Any emerging blockers or risks:
Any unresolved dependencies:
New lessons from the hypothesis:
Changes to the hypothesis scope or timeline: