Page MenuHomePhabricator

[MILESTONE] Conduct usability tests for Tone Check proof of concept
Closed, ResolvedPublic

Description

This task involves the work of conducting a usability test of the Peacock Check proof of concept that integrates the first iteration of the peacock language detection model the ML Team is developing.

More broadly, this task supports WE 1.2.13:

If we conduct usability tests of an initial engineered version of Peacock Check with ≥10 newcomers and Junior Contributors and ≥80% of them describe the experience using terms like "helpful," "makes sense," and "clear", then we can be confident the proposed UX has the potential to lower the rate at which the new content edits are reverted on the grounds of WP:WTW (and related policies)

Decision(s) to be made

Research Questions

  1. To what – if any – extent did people find the feedback Edit Check offered confusingly/unhelpfully generic?
    • Context: this question is a response to us deciding we can be okay with flagging non-neutral language even if the type of non-neutral language varies across cases. | See Slack and comments from @jhsoby, @Strainu, and @matej_suchanek in T388215.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Openppelberg
OpenMNeisler
Resolvednayoub
ResolvedEsanders
Resolvedppelberg
ResolvedDLynch
Resolvedgkyziridis
Resolvedachou
Resolveddchan
Resolveddchan
Resolvedbmartinezcalvo
Resolvedachou
Resolvedppelberg
ResolvedSucheta-Salgaonkar-WMF
ResolvedQuiddity
Resolvednayoub
ResolvedEsanders
Resolvednayoub
Resolvednayoub
ResolvedMRaishWMF
Resolvedzoe

Event Timeline

ppelberg renamed this task from Conduct usability tests for how we might surface feedback to people while they are typing to Conduct usability tests for Peacock Check proof of concept.Mar 4 2025, 8:07 PM
ppelberg moved this task from Untriaged to Upcoming on the Editing-team board.
ppelberg moved this task from Backlog to Triaged on the EditCheck board.
ppelberg moved this task from To Triage to Triaged on the VisualEditor board.
ppelberg renamed this task from Conduct usability tests for Peacock Check proof of concept to [MILESTONE] Conduct usability tests for Peacock Check proof of concept.Mar 14 2025, 9:16 PM
Aklapper renamed this task from [MILESTONE] Conduct usability tests for Peacock Check proof of concept to [MILESTONE] Conduct usability tests for Tone Check proof of concept.May 28 2025, 11:43 AM

In all testing the Editing Team completed (1, 2, 3, 4 – volunteers (new, experienced, and across geographies) described the Peacock Check experience as familiar, encouraging, and supportive.

These reactions have led the Editing Team think the Peacock Check UX is ready for deployment because we are confident newcomers will intuitively understand its purpose and be receptive to the feedback it's offering them.

Some highlights...

This would encourage me to [edit]. I said one of the reasons I didn't contribute was because I read some of the contributions and I think they're really well written and I think I can't write as well as that. My English isn't strong enough, but [with PC] I feel now that I could write something useful.

I'm for it because if somebody doesn't want it to be neutral, they know that. So they're just going to ignore it and post it. And if it's a situation where I do want to do the right thing essentially, then it's good to catch that maybe my language could be better.

I think it’s really interesting. I’m averse to the style suggestions of other programs. They’re usually kind of in bad faith—they’re bad suggestions. This is different. There’s no suggestion of what you should actually be saying...

Major lessons

  • People find Edit Check familiar and reminiscent of suggestions/feedback they receive on other platforms.
    • Knowing this gives the Editing Team added confidence to shift our attention to populating the core UX with new Suggestions and Checks.
  • Readers and Editors vary in how they understand neutrality
    • Experienced readers think of neutrality as being associated with a lack of emotion.
    • Experienced editors refers to the degree of alignment between a source and the in-article claims it supports.
  • Readers remain uncertain about what caused Peacock Check to become activated.
    • This finding points to the potential need for us to invest further in a UX that makes it easier for people to identify the "offending" language and the know-how needed to "neutralize" it.