Page MenuHomePhabricator

Estimate number of edit sessions Paste Check has potential to activate within
Closed, ResolvedPublic

Description

This task involves the work of estimating the maximum number of newcomer edit sessions in which we could expect a Paste Check to be shown.

We are seeking this number in an effort to estimate how many wikis are needed to participate in the paste check a/b test for us to be able to draw statistically significant conclusions in ≤6 weeks.

Requirements

For the wikis listed in T403680, calculate the number of edit sessions initiated by people who have published ≤100 cumulative edits (logged out included) in which ≥1 feature: editCheck-paste, action: relevant-paste event (implemented via T402460) in VisualEditorFeatureUse was emitted/logged.

Open questions

  • 1. During what period of time will we evaluate these edits?

Event Timeline

ppelberg renamed this task from Estimate upd to Estimate number of edit sessions Paste Check has potential to activate within.
ppelberg updated the task description. (Show Details)

calculate the number of edit sessions initiated by people who have published ≤100 cumulative edits (logged out included) in which ≥1 feature: editCheck-paste, action: relevant-paste event

The <100 check should be superfluous, because that event only fires when a paste check would actually have been shown and so it already includes that check.

Summarizing some initial results from an analysis of Paste Check eligible events logged to date:

Frequency of Paste Check Sessions

For the wikis listed in T403680, calculate the number of edit sessions initiated by people who have published ≤100 cumulative edits (logged out included) in which ≥1 feature: editCheck-paste, action: relevant-paste event (implemented via T402460) in VisualEditorFeatureUse was emitted/logged

Data below reflects all Paste Check eligible sessions logged over a two-week timeframe (18 September 2025 - 2 October 2025) across all partner wikis listed in T403680.

  • Total number of paste check eligible sessions: 6715 editing sessions
    • This equates to about 470 paste check eligible sessions per day
  • By platform
platformNumber of paste check eligible sessions
desktop5577
phone1138
  • By User Experience
experience_level_groupNumber of paste check eligible sessions
Unregistered2227
Newcomer1094
Junior Contributor3394
  • By Wiki
wikiNumber of paste check eligible sessions
arwiki801
bnwiki129
cawiki133
cswiki194
dewiki1003
euwiki<50
fawiki280
glwiki<50
hiwiki79
idwiki529
itwiki947
nlwiki332
plwiki369
ruwiki921
simplewiki166
swwiki61
twwiki<50
ukwiki260
viwiki166
zhwiki309

Proportion of all published new content edits by newcomers

  • 32.9% of all published desktop new content edits and 16% of all published mobile new content edits by newcomers were identified as being eligible to be shown Paste Check.
  • By User experience: New content edits by newcomers are more frequently identified as being eligible for Paste Check.
experience_level_groupprop_edits
Unregistered23.2%
Newcomer33.5%
Junior Contributor29.2%

Revert rates
How often are paste check eligible edits reverted?

If we review all published new content edits by newcomers (denominator = all published new content edits) :

  • 2.8% of all desktop published edits and 2.6% of all mobile published edits by newcomers were identified as both eligible for paste check and reverted.

Only about 3% of all saved new content edits are reverted and identified as being eligible for Paste Check.

If we limit to only published new content edits identified as eligible for Paste Check (denominator = all published new content edits identified as eligible for Paste Check):

  • 8.6% of all desktop and 16% of all mobile published new content edits identified as eligible for Paste Check are reverted.
  • By Experience Level
experience_level_groupprop_edits
Unregistered11.4%
Newcomer10.7%
Junior Contributor8.7%

This is lower than the revert rates we have observed for previous checks such as Tone Check (29%, ref: T371158#11220470). However, this may also include edits where the final published text did not include pasted text. The event instrumented in T402460 identifies edits that would be eligible to be shown Paste Check during an edit session if enabled, but does not assess whether the final published text still includes pasted text.

Next Steps: Will review the results with @ppelberg and assess the recommended AB test duration needed to evaluate identified metrics.

Per the above and what Megan and I discussed offline last week, Paste Check is likely to activate in a relatively high proportion of the new content edits newcomers publish (23.2% - 33.5%).

(For a point of comparison, Tone Check has the potential to activate in 9% of the new content edits newcomers publish.)

With the above said, the revert rate of edits in which Paste Check has the potential to activate is quite low:

  • 2-3% if we consider the denominator to be all new content edits newcomers publish
  • 8% if we scope the denominator more narrowly to edits that involve newcomers pasting content in.

So we're left with a choice: do we adjust the KPI to be scoped to a shift in all new content newcomer edits or do we scope the KPI down to only those which would have the potential for paste check to show?

We've decided to do the latter so that we can report statistically significant findings in 5-6 weeks as opposed to 12- 16 weeks, which the latter would require.