Page MenuHomePhabricator

[WE 1.1.1] Run an A/B test to evaluate impact of Paste Check
Closed, ResolvedPublic

Description

This task holds the work involved with evaluating the impacts of Paste Check through a controlled experiment with a specified start and end date.

The Editing Team is pursuing this experiment with the goal of evaluating the extent to which the feature, in its current form, warrants being deployed to all wikis.

Overarching hypothesis

If we prompt new(er) volunteers pasting text from an external site to confirm whether they wrote the content they are attempting to add, then we will see a decrease in the percentage of new content edits new(er) volunteers publish that are reverted on the grounds of WP:COPYVIO (and related policies).

image.png (153×512 px, 23 KB)

Experiment timeline

WARNING: the dates and steps that follow are NOT yet final.
MilestoneTarget Completion DateResponsibleStatusNotes
Publish announcement (T403637)19 Sep 2025@Trizek-WMF
Complete pre-deployment QA (T404923)23 Sep 2025@Ryasmeen
Start test (T405422)Friday, 10 OctoberEditing Engineeringwhile the test technically started on 8 Oct, there were a few fixes that didn't land until 9 October
Publish leading indicators report31 October 2025@MNeisler~2 weeks after test start
End test21 November 2025Editing Engineering~6 weeks after test start
Publish final report5 December 2025@MNeisler~2 weeks after test end

Decision to be made

The experiment(s) we run as part of the Paste Check are meant to help us make the following decisions...
What – if any – adjustments to the Paste Check UX need to be made (e.g. T407543) before we can be confident all of the following are true?

  1. Newcomers that encounter Paste Check are more likely to publish new content edits in the main namespace that are not reverted due to copyright violations (and related policies).
  2. Patrollers generally agree that Paste Check encourages newcomers to publish edits they consider to be constructive

KPIs

The main outcomes we are trying to impact through this feature. These are what we are primarily using for evaluating the hypothesis and deciding whether to deploy an intervention more widely.

HypothesisDecision(s) to be madeMetric description
If we prompt new(er) volunteers pasting text from an external site to confirm whether they wrote the content they are attempting to add, then we will see a ≥4% decrease in the percentage of new content edits new(er) volunteers publish that are reverted on the grounds of WP:COPYVIO (and related policies).Decision A: Does showing people a prompt when pasting text from an external site lower the likelihood that new content edits include copyright violations? Decision B: Do people intuitively interact with the Paste Check experience in ways that are NOT disruptive to them or the wikis?1) Proportion of new content edits shown or eligible to be shown Paste Check that are reverted on the grounds of WP:COPYVIO (and related policies). Note: We decided to scope the KPI down to only new content edits that have the potential for Paste Check to show. This will allow us to focus the analysis on the edits we are seeking to impact and reliably detect any statistically significant changes within the planned AB test duration. This is based on a recent analysis of the frequency of Paste Check eligible edits and their revert rate. See T403861#11255677 2) Proportion of edits started (defined as reaching point that Paste Check was or would be shown) that are successfully published (not reverted).

Secondary metrics

Used to learn about additional impact of Paste Check, but are not primary targets of the intervention. They reveal side effects (both positive and negative) of trying to improve the Primary Metric with the intervention.

IDHypothesisMetric description
Curiosity #1A larger proportion of new content edits by Newcomers and Junior Contributors will be constructive because they will be shown a prompt to confirm whether they wrote the content they are attempting to add when pasting text from an external site.⭐Proportion of published edits[i] by users with ≤100 cumulative edits that are constructive [ii]
Curiosity #2Newcomers and Junior Contributors will be more aware of the need to consider whether the text they're pasting from an external site into a main article namespace is at risk of copyright violations.The proportion of newcomers and Junior Contributors that publish at least one new content edit that was reverted due to copyright violations Note: We’ll want to observe a decrease in this metric.
Curiosity #3Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that does not include copyright violations because Paste Check will have caused them to realize when they are at risk of this not being true.1) Proportion of newcomers and Junior Contributors that publish an edit Paste Check was activated within and successfully return to make an unreverted edit to a main namespace during the identified retention period. 2) Proportion of newcomers and Junior Contributors that publish an edit Paste Check was activated within and return to make a new content edit where Paste Check was not shown during the identified retention period.

i: We'll need to break edits out by platform as WE 1.1 is scoped to mobile-only.
ii: "Constructive edits" = edits to pages in any Wikipedia main namespace that are not reverted within 48 hours of being published

Leading indicators

T400098: [Paste Check] Report on leading indicators

Guardrails

Used to make sure that the new checks presented are not negatively impacting an editor’s experience completing an edit or causing disruption on the wikis. The scenarios named in the chart below emerged through T325851.

Guardrail NameMetric descriptionNotes
Edit quality decreaseProportion of published edits that add new content and are reverted within 48 hours.Will include a breakdown of the revert rate of published new content edits shown and not shown Paste Check.
Edit completion rate drastically decreasesProportion of edits started (defined as reaching point that Paste Check was or would be shown) that are published.Will include breakdown by the number of checks shown to identify if a lower completion rate corresponds with a higher number of checks shown.
People shown Paste Check are blocked at higher ratesProportion of contributors blocked after publishing an edit where Paste Check was shown, compared to contributors not shown Paste Check.
High false positive rateProportion of published edits where a user declined a Paste Check prompt by indicating that it was irrelevant.Consider decision we made in T406164#11247475 to show Paste Check card on mobile immediately pasting

Notable events

A/B Test: Decision Matrix

IDScenarioIndicator(s)Plan of Action
1Paste Check is disrupting, discouraging, or otherwise getting in the way of volunteers. Read: people are less likely to publish the edits they start.≥20% drop in edit completion rate in edit sessions where Paste Check is activated relative to edits that would have been shown Paste Check but were not.Pause scaling plans; If results indicate that significant decreases are only associated with a high number of Paste Checks shown, set a threshold for the maximum number checks that can be shown within a single session. If we observe significant decreases for both single and multiple checks presented in a single session, investigate changes to the UX.
2Paste Check is increasing the likelihood that people will publish destructive edits.Increase in the proportion of published edits where Paste Check was activated that are reverted within 48 hours relative to edits that would have been shown Paste Check but were not. Increase in the proportion of contributors blocked after publishing an edit where Paste Check was shown, compared to contributors who were not shown Paste Check.Pause scaling plans, Review edits to try to identify any patterns in abuse and propose changes to UX to mitigate them.
3Paste Check is causing people to publish edits that align with project policies and that are not reverted.Decrease in the proportion of edits Paste Check was activated within that are reverted within 48 hours on the grounds of WP:COPYVIO relative to edits that would have been shown Paste Check but were not.Move forward with scaling plans
4Paste Check is effective at causing people to publish new content edits without pasted text from external sites, but those edits are still reverted.Increase in the proportion of edits where Paste Check was activated that were published without unmodified pasted text AND increase or no change in the proportion of these edits that are reverted within 48 hours on the grounds of WP:COPYVIO relative to edits that would have been shown Paste Check but were not.Pause scaling plans; Further investigation into methodology used to identify pasted text from an external site (e.g. might the false negative rate be too high); Analysis and manual review of reverted edits to understand why those edits were still reverted.
5Paste Check is not effective at causing people to publish new content edits without pasted text from external sites but the check is not disrupting to volunteers.No change or decrease in the proportion of new content edits Paste Check was activated within that were published without unmodified pasted text from a non-Wikipedia HTML source AND A) no significant drop in edit completion rate or B) no significant spike in block or revert rates.Move forward with scaling plans

Related Objects

StatusSubtypeAssignedTask
OpenNone
In Progressnayoub
OpenNone
Openppelberg
ResolvedMNeisler
Declinedbmartinezcalvo
Resolvedbmartinezcalvo
ResolvedDLynch
Declinedbmartinezcalvo
Resolvedppelberg
ResolvedMNeisler
DuplicateNone
Resolvedppelberg
Resolvedppelberg
Resolvedppelberg
ResolvedMNeisler
Resolvedppelberg
Resolvedbmartinezcalvo
ResolvedDLynch
ResolvedTrizek-WMF
OpenFeatureNone
Resolvedppelberg
ResolvedDLynch
ResolvedTrizek-WMF
ResolvedTrizek-WMF
ResolvedMNeisler
Resolvedppelberg
Resolvedldelench_wmf
ResolvedDLynch
Resolvedppelberg
ResolvedDLynch
ResolvedEAkinloose
ResolvedEsanders

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ppelberg updated the task description. (Show Details)

For Tech News:

[[mw:Special:MyLanguage/Help:Edit_check#paste | Paste Check ]] is a new Edit Check feature to help avoid and fight copyright violations. When editors paste text into an article, Paste Check can prompt them to confirm the origin and licensing of the content. Starting Wednesday, 8 October, [[ phab:T403680 |22 wikis will test Paste Check ]].

For Tech News:

[[mw:Special:MyLanguage/Help:Edit_check#paste | Paste Check ]] is a new Edit Check feature to help avoid and fight copyright violations. When editors paste text into an article, Paste Check can prompt them to confirm the origin and licensing of the content. Starting Wednesday, 8 October, [[ phab:T403680 |22 wikis will test Paste Check ]].

Thanks!

The start date has been shared with all communities impacted.

ppelberg updated the task description. (Show Details)

I've completed an analysis reviewing the impact of Paste Check based on the results of AB test collected from 9 October 2025 through 28 November 2025. See summary of results below and the full report for additional details on methodology and metrics by various dimensions (including number of checks shown, platform, user experience, partner Wiki).

Summary of Results

New Content Edit Revert Rate

The revert rate trended lower for edits shown Paste Check; however, we are unable to confirm statistical significance at the 95% confidence level.

  • We observed a -18% relative decrease [10.5% → 8.6%] in the revert rate for edits shown Paste Check compared to edits eligible but not shown Paste Check. However, the data is insufficient to statistically confirm that this effect was not due to random noise.
  • Our analysis indicates there is an 85.6% probability that edits shown Paste Check are reverted less frequently than eligible edits in the control group. While this falls short of the 95% certainty threshold, it's a strong indicator that the feature is decreasing reverts.
    • Revert rates decreased across all user types, with Paste Check’s magnitude of impact increasing with user experience. We see the highest impact for Junior Contributors (a person that has made between 1 and 100 edits). There was a -24% decrease in the revert rate of edits completed by Junior Contributors shown Paste Check compared to a -9% decrease in the revert rate of edits by Newcomers.
    • Trends differ by platform. For desktop, we observed a -27% decrease [2.6 percentage points; 9.6% → 7%] in revert rate for edits shown Paste Check. While on mobile web, there was a +20% increase [3.1 percentage points; 15.5% → 18.6%]. We are unable to confirm if either of these effects is statistically significant.

Edit Completion Rate

Paste check did not cause any significant changes in edit completion rate on either desktop or mobile web. Overall, we observed a -1.9% decrease [1.2 percentage points] in the completion rate of edits shown Paste Check.

  • We also did not see any significant declines in edit completion rate by user type but data indicates Paste Check's impact on a user's likelihood of completing an edit may vary based on the user's experience. There was a +2.7% increase in edit completion rate for unregistered users shown Paste Check and -3.4% decrease for Junior Contributors.

Constructive Edit Rate

Aligned with the revert rate results, results show signs that Paste Check increased constructive edits, especially on desktop but we are unable to confirm statistical significance at the 95% confidence level with the data available at the time of this analysis.

  • Constructive edits increased by +2% for users in the test group shown Paste Check. While trends appear positive, we do not have sufficient data to confirm this effect was not due to random noise.
  • Our analysis indicates there is an 85.6% probability that edits shown Paste Check are more constructive than eligible edits in the control group. While this falls short of the 95% certainty threshold, it's a strong indicator that the feature is increasing constructive edits overall.
  • Trends differ by platform. On desktop, we observed a +2.9% [90.4% → 93%] increase in constructive edit rate for edits shown Paste Check. While on mobile web, there was -3.7% decrease [84.5% → 81.4%]. Both of these results are statistically inconclusive.

Constructive Retention Rate

There were no significant changes in the constructive edit rate for users shown Paste Check. 6% of contributors in both the test and the control group returned 7 to 14 days to make a constructive edit after making an edit where Paste Check was shown and eligible to be shown.

Guardrail summary

Paste Check does not appear to be negatively impacting an editor’s experience or causing disruption on Wikipedia projects based on a review of the identified guardrail metrics, as described in the task description.

  • We did not observe any significant decreases in edit completion rate or edit revert rate for edits where Paste Check was shown or overall across all edits.
  • People elected to keep their pasted text in 56% of edits shown in Paste Check. This aligns with the Edit Check dismissal rate observed for Tone Check and slightly lower than rates observed for Reference Check.
  • We also confirmed that people are not blocked at a higher rate after being shown Paste Check.

cc @ppelberg

Thanks so much for this @MNeisler!

Trends differ by platform. For desktop, we observed a -27% decrease [2.6 percentage points; 9.6% → 7%] in revert rate for edits shown Paste Check. While on mobile web, there was a +20% increase [3.1 percentage points; 15.5% → 18.6%]. We are unable to confirm if either of these effects is statistically significant.

For the purposes of WE1.1 KR scoring, a 20% increase in revert rate on mobile web means an x% decrease in constructive edit rate on mobile web, right? So what would x be in this case? I know this would be an asterisked figure because of the statistical significance concerns you raised, but I think this will be a key piece for us to understand as it pertains to WE1.1 success measurements.

@ldelench_wmf

For the purposes of WE1.1 KR scoring, a 20% increase in revert rate on mobile web means an x% decrease in constructive edit rate on mobile web, right? So what would x be in this case?

On mobile web, we observed a -3.7% decrease [-3.1 percentage points; 84.5% -> 81.4%] in constructive edits; however, there was a much smaller sample of mobile web edits compared to desktop edits at the time of analysis. As a result, there is a large range of uncertainty associated with this impact.

I've updated the summary to clarify this point as well. Let me know if any additional info would be helpful.

mikez-WMF changed the task status from Open to In Progress.Fri, Feb 13, 12:26 PM
mikez-WMF added a project: Community-Wishlist.
mikez-WMF moved this task from Backlog to In Progress on the Community-Wishlist board.