≥2 weeks after starting of the Tone Check A/B Test (T387918), we will check on a set of leading indicators (outlined below).
We will use this ticket to scope and conduct this analysis.
Analysis timing
Target completion date: Wednesday, 24 Sep 2025
Decision(s) to be made
What – if any – adjustments/investigations will we prioritize for us to be confident moving forward with evaluating the Peacock Check's impact in T387918?
Leading indicators
Metrics
| ID | Name | Owner | Metric(s) for Evaluation | Conclusion |
|---|---|---|---|---|
| 1. | Newcomers and Junior Contributors are not encountering Peacock Check | Editing | Proportion of new content edits Peacock Check is shown within | |
| 2. | Newcomers and Junior Contributors are not understanding the feature | Editing | Proportion of contributors that are presented Peacock Check and abandon their edits | |
| 3. | People deem Peacock Check irrelevant | Editing | Proportion of edits wherein people elect to dismiss/not change the text they've added | |
| 4. | Peacock Check is causing disruption | Editing | 1) Proportion of people blocked after publishing an edit where Peacock was shown and 2) Proportion of published edits that add new content and are reverted within 48hours | |
| 5. | Model is not able to evaluate tone of published edit quickly enough | Editing | Proportion of edits that are published before the model is able to return an evaluation. See T388716. | |
| 6. | Model service availability | ML | Service Availability SLO: 95% of all requests return a 200/300/400 response code. See T390706 | 99.99% (T405338) |
| 7. | Model is not delivering responses quickly enough | ML | Proportion of all requests that return a response within 1000 milliseconds | 97.04 % (T405338) |