Page MenuHomePhabricator

Introduce an edit tag when peacock language is detected within new content being added
Closed, ResolvedPublic

Assigned To
Authored By
ppelberg
Mar 12 2025, 8:33 PM
Referenced Files
Restricted File
Jul 28 2025, 4:57 PM
Restricted File
Jul 28 2025, 4:57 PM
F65686588: image.png
Jul 28 2025, 4:55 PM
F65686571: image.png
Jul 28 2025, 4:55 PM
F65686568: image.png
Jul 28 2025, 4:55 PM

Description

To evaluate the impact of Reference Check, we evaluated the proportion of new content edits that also included a new reference.

To calculate this proportion (T342930), we leveraged the editcheck-newcontent and editcheck-newreference change tags.

Similarly, to evaluate the impact of Tone Check (T387918), we need to analyze the proportion of new content edits that include problematic language and we will use this ticket to organize the work of:

  • Leveraging the Tone Check model to evaluate the presence/absence of problem language within new content edits
  • Appending edit tags to edits when the Tone Check model detects non-neutral language within new content edits

Requirements

  1. Do not block the saving of an edit on the model returning a result
  2. Each time an edit is saved without the model returning a result beforehand, log this so that Megan can filter these edits out from edits where the model returned an evaluation "no problematic language present"
    1. Please share proposed implementation with @MNeisler for review before finalizing.
  3. Name new tag editcheck-tone
  4. Implement editcheck-tone as a hidden tag
  5. Only edits in languages we've evaluated the model with (English, Spanish, Japanese, Portuguese, French) by editors which would meet the requirements for Tone check is evaluated and tagged (e.g. editors with <100 edits, or whatever is set in config)
  6. Make sure this tag only runs on wikis that have Edit Check enabled

Open questions

  • 1. To what extent – if any – is it feasible for the model to evaluate presence/absence of problematic language at save-time so that a positive evaluation can be reflected as an edit tag?
  • 2. To what extent – if any – is evaluation happening at save-time required? Asked another way: can edit tags be appended after-the-fact?
    • Per discussion with Editing Engineering, tags need to be applied at save-time
  • 3. Would we consider the intervention successful if the model detected problematic language within published edits that were NOT reverted?
  • 4. To extent would we expect a person's connection speed impact how quickly the model is able to return a result?
    • Context: were a person's connection speed to be highly correlated with the model returning a result, we would risk biasing the sample.

Done

  • Editing QA confirms tag is being applied as expected
  • @MNeisler to verify model timeouts are being logged in VEFU

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ppelberg updated the task description. (Show Details)
ppelberg moved this task from Untriaged to Upcoming on the Editing-team board.
ppelberg edited subscribers, added: Sucheta-Salgaonkar-WMF; removed: Trizek-WMF.
ppelberg edited projects, added Editing-team (Kanban Board); removed Editing-team.

Per what @MNeisler and I converged on offline on Wednesday (April 16th), we'd like for the KPI for the Peacock Check controlled experiment (T387918) to be the following:
Proportion of all new content edits published without biased language and that are not reverted.

Next steps
Before moving forward with the above, we'll first need to learn from the Machine Learning Team how feasible T388716 is.

Update

Per offline discussion with @SSalgaonkar-WMF who talked with @isarantopoulos, the model should have no issue handling the additional traffic this ticket could produce.

As such, next step: discuss within Editing Engineering what – if anything – will need to happen on VE's side to send new published text to model, receive an eval back, and decide whether to append the tag this ticket is asking for to said edit.

We talked about this. Main technical caveats are:

  1. Increased model load. We'd need to send some potentially-arbitrary number of calls to the model pre-save for all VE edits. (But generally just one, because most edits are within a single paragraph.)
  2. We might miss some, because we wouldn't want to block the save on this. So if the call to the model hasn't finished by the time the person publishes that edit just wouldn't get tagged.

Technically this all seems feasible, assuming that the model holds up per the earlier comment.

To what extent – if any – is evaluation happening at save-time required?

Forgot to answer this bit, so: it's not required, but it's simpler. Done at save time, we can just reuse the same logic and calls we're using for the edit check and feed it into the revision tags. Done post-save, we need to build a separate pipeline for checking it server-side, reconstruct the diff and pass it into the model via an alternate means (which might together result in slight variance between the check and the tag in terms of what's flagged).

The primary benefit of doing it post-save would be that we could tag all edits rather than just VE edits. (You may remember this trade-off being talked about back when we were doing the add-reference content tags.)

ppelberg updated the task description. (Show Details)

We talked about this. Main technical caveats are:

  1. Increased model load. We'd need to send some potentially-arbitrary number of calls to the model pre-save for all VE edits. (But generally just one, because most edits are within a single paragraph.)

Great spot and understood.

Per offline discussion with @isarantopoulos, we can assume the model will be able to handle the increased traffic implementing the tag this ticket is asking for will send to the model.

And if/when the above proves NOT to be true, we can rely on the dashboard the ML Team will be building in T390706 to make us aware. If/when we encounter this scenario, we can consider other approaches (e..g the the post-save approach @DLynch described in T388716#10822391).

  1. We might miss some, because we wouldn't want to block the save on this. So if the call to the model hasn't finished by the time the person publishes that edit just wouldn't get tagged.

Per discussion with @MNeisler offline last week, missing some edits is fine provided each time an edit is saved without the model returning a result beforehand, we log this so that Megan can filter these edits out from edits where the model returned an evaluation "no peacock language present"

Update

Barring any blocking issues with renaming Peacock Check to Tone Check, I propose we name the tag that gets appended when peacock language is detected within a published edit that adds new content editcheck-tone.

The above follows the conventions we redefined in T373949.

Next step(s)

  • @ppelberg: ensure no blocking concerns with renaming Peacock Check to Tone Check

Update

Barring any blocking issues with renaming Peacock Check to Tone Check, I propose we name the tag that gets appended when peacock language is detected within a published edit that adds new content editcheck-tone.

The above follows the conventions we redefined in T373949.

Next step(s)

  • @ppelberg: ensure no blocking concerns with renaming Peacock Check to Tone Check

Per offline discussion, we're proceeding with renaming Peacock Check to Tone Check.

Change #1152383 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check: create editcheck-tone tag for when tone is detected

https://gerrit.wikimedia.org/r/1152383

If there's a pending attempt to query the model at the time of this running, it'll log feature: editCheck-tone, action: save-before-check-finalized to VisualEditorFeatureUse.

(It's in flux in the implementation of the tone check whether this is actually a possible state currently.)

ppelberg added a subscriber: Esanders.

Each time an edit is saved without the model returning a result beforehand, log this so that Megan can filter these edits out from edits where the model returned an evaluation "no problematic language present"

  • Please share proposed implementation with @MNeisler for review before finalizing.

Per task requirement (above), assigning this over to Megan to review the implementation David described here:

If there's a pending attempt to query the model at the time of this running, it'll log feature: editCheck-tone, action: save-before-check-finalized to VisualEditorFeatureUse.

(It's in flux in the implementation of the tone check whether this is actually a possible state currently.)

(It's in flux in the implementation of the tone check whether this is actually a possible state currently.)

I realize I was unclear here: I'm not sure whether this could currently happen for someone in the test group (because the handling of timeouts has been in flux), but it can definitely happen for someone in the control group.

The proposed implementation feature: editCheck-tone, action: save-before-check-finalized looks good to me.

I realize I was unclear here: I'm not sure whether this could currently happen for someone in the test group (because the handling of timeouts has been in flux), but it can definitely happen for someone in the control group.

@DLynch - Do you mean that you're unsure if someone in test group would be able to save an edit before the model returns a result?

@MNeisler Yes -- I think that in the current state of the patch it'll force people to wait until the API call is complete before allowing them to progress.

Thanks for clarifying @DLynch. I'll keep that in mind when looking at the results.

Reassigning this to back to you to implement. The proposed new event name looks good to me.

Question: @DLynch, @MNeisler wisely wondered: might it be possibly for us to deploy the tag this ticket introduces at all wikis so that we can isolate all edits where Tone Check would've been shown.

This information would be immediately useful for T397372.

You mean completely ignoring the a/b test setting, but keeping the check-eligibility requirements? If so, that'd be doable by my understanding. It'd represent a sudden spike in load going to the model (all logged-out VE edits, and all logged-in VE edits by users with <100 edits would trigger at least one call to the model), but my understanding is that we've been told it can handle that.

You mean completely ignoring the a/b test setting, but keeping the check-eligibility requirements?

Exactly.

If so, that'd be doable by my understanding. It'd represent a sudden spike in load going to the model (all logged-out VE edits, and all logged-in VE edits by users with <100 edits would trigger at least one call to the model), but my understanding is that we've been told it can handle that.

Understood, ok. I'll clear with ML folks before we move forward.

Assuming we're cleared to do the above, @DLynch: would it be accurate the only thing we'd change about the scope/requirements of this task would be to make the following edit?

  1. Only Edits on all Wikipedias wikis where the A/B test is running are tagged

You mean completely ignoring the a/b test setting, but keeping the check-eligibility requirements?

Exactly.

If so, that'd be doable by my understanding. It'd represent a sudden spike in load going to the model (all logged-out VE edits, and all logged-in VE edits by users with <100 edits would trigger at least one call to the model), but my understanding is that we've been told it can handle that.

Understood, ok. I'll clear with ML folks before we move forward.

Per today's offline discussion, the ML Team is fine with the above.

Next steps

  • @DLynch: refresh patch so that it respects the new requirement for requests to be sent out regardless of if the edit was published on a wiki participating in the a/b test.

Per what @MNeisler, @DLynch, and I discussed separately offline, we're going to adjust the requirements in the two ways described below so that we can:

  1. Log edits the tone model "thinks" introduce tone issues
  2. Without A) said edits needing to be published on wikis that are participating in the A/B test (T387918) or B) the Tone Check UI becoming activated within edit sessions

Requirement adjustmemts

  • REMOVE "Only edits on wikis where the A/B test is running are tagged."
  • ADD "Implement editcheck-tone as a hidden tag.

Change #1170415 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check: restrict tone check to only validated languages

https://gerrit.wikimedia.org/r/1170415

Make sure this tag only runs on wikis that have Edit Check enabled

To confirm how this will work: any wiki with wgVisualEditorEditCheckTagging set will have this loaded, even if edit check itself is disabled there. It works this way because back when we created the reference check, we wanted to be able to compare against wikis that didn't have it enabled. This way of thinking about it makes less sense in a multi-check world, admittedly.

This is currently set to:

'wgVisualEditorEditCheckTagging' => [
	'default' => false,
	'wikipedia' => true,
],

i.e. it will consider adding the tag only on wikipedias.

(And once the patch I just added is merged, it will then immediately stop considering on most of the wikipedias because of a language mismatch.)

Change #1170415 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check: restrict tone check to only validated languages

https://gerrit.wikimedia.org/r/1170415

Change #1152383 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check: create editcheck-tone tag for when tone is detected

https://gerrit.wikimedia.org/r/1152383

image.png (1×2 px, 327 KB)

image.png (870×1 px, 106 KB)

⚠️ Layered popups:

image.png (644×754 px, 63 KB)

{F65686600}

{F65686598}

ppelberg closed this task as Resolved.EditedJul 28 2025, 7:07 PM

Per offline discussion with @MNeisler today, we have confirmed that 1 save-before-check-finalized event has been logged.

We're going to consider this sufficient for verifying that this event is being logged and stored as expected.

Also: having verified #editcheck-tone tags being appended to edits at all wikis in languages where the model is available, we can consider this resolved.

Tags

es.wiki: es:recent changes
ja.wiki: ja:recent changes
pt.wiki: pt:recent changes
en.wiki: en:recent changes