Page MenuHomePhabricator

Evaluate efficacy of Tone Check model output (internal review)
Closed, ResolvedPublic

Description

With T379397, we are venturing to build an Edit Check that uses machine learning/AI to detect the presence of peacock words within the new text people are attempting to add to Wikipedia.

This task involves the work of evaluating the efficacy of the model we end up building in collaboration with the ML Team.

Story

As a member of the ML/Editing Team, I want to be able to review the evaluations the initial Peacock Check model has made on real edits, and offer feedback about them, so that we can collectively A) decide what – if any – adjustments ought to be made to it and B) become confident enough in its accuracy to share with volunteers (T388471)

Open questions

  • To what extent – if any – will we depend on volunteers to participate in this evaluation?
    • This ticket will only include an internal review. Volunteers will participate in T388471.

Requirements/Process

StepDescriptionStatusNotes
Step 1ML team to populate the - Peacock edit check model evaluation - V1 spreadsheet with ~300 edits for members of the ML and Editing Teams to review✅ done
Step 2@SSalgaonkar-WMF + @ppelberg will assign members of their respective teams to review edits✅ done
Step 3Members of ML and Editing teams label edits✅ done
Step 4ML Team to review feedback and identify/propose what – if any – adjustments they think could be made to improve the model

Event Timeline

ppelberg renamed this task from Evaluate efficacy of Peacock Check model output to Evaluate efficacy of Peacock Check model output (internal review).Mar 10 2025, 11:10 PM
ppelberg edited projects, added Editing-team (Kanban Board); removed Editing-team.
ppelberg updated the task description. (Show Details)
ppelberg moved this task from Inbox to Doing on the Editing-team (Kanban Board) board.
Aklapper renamed this task from Evaluate efficacy of Peacock Check model output (internal review) to Evaluate efficacy of Tone Check model output (internal review).May 28 2025, 11:43 AM