With T379397, we are venturing to build an Edit Check that uses machine learning/AI to detect the presence of peacock words within the new text people are attempting to add to Wikipedia.
This task involves the work of evaluating the efficacy of the model we end up building in collaboration with the ML Team.
Story
As a member of the ML/Editing Team, I want to be able to review the evaluations the initial Peacock Check model has made on real edits, and offer feedback about them, so that we can collectively A) decide what – if any – adjustments ought to be made to it and B) become confident enough in its accuracy to share with volunteers (T388471)
Open questions
- To what extent – if any – will we depend on volunteers to participate in this evaluation?
- This ticket will only include an internal review. Volunteers will participate in T388471.
Requirements/Process
| Step | Description | Status | Notes |
|---|---|---|---|
| Step 1 | ML team to populate the - Peacock edit check model evaluation - V1 spreadsheet with ~300 edits for members of the ML and Editing Teams to review | ✅ done | |
| Step 2 | @SSalgaonkar-WMF + @ppelberg will assign members of their respective teams to review edits | ✅ done | |
| Step 3 | Members of ML and Editing teams label edits | ✅ done | |
| Step 4 | ML Team to review feedback and identify/propose what – if any – adjustments they think could be made to improve the model |