Page MenuHomePhabricator

[Paste Check] Report on leading indicators
Open, Needs TriagePublic

Description

≥2 weeks after starting of the Paste Check A/B Test (T399669), we will check on a set of leading indicators (outlined below).

We will use this ticket to scope and conduct this analysis.

Analysis timing

Per 8 October 1:1 between Megan and Peter, we plan for the leading indicator analysis to be ready for publication by 31 October 2025.

Decision(s) to be made

  • What – if any – adjustments/investigations will we prioritize for us to be confident moving forward with evaluating the Paste Check's impact in T399669?
    • E.g. might we prioritize the work necessary to unblock T407543 and enable Paste Check to be shown during the Pre-Save moment if people do not interact with it during Mid-Edit?
    • No adjustments at this time. See T400098#11381869 for context.

Open questions

  • 1. How might we evaluate the extent to which the options in the decline survey are misleading people? More context in T400098#11191795.
  • 2. How might we reliably estimate the frequency of false negatives (instances when a user pastes text that violates copyright policies but was not shown Paste Check due to its current configuration)? We added instrumentation in T407302 to track pastes that would have caused Paste Check had the text not been copied from a known source so that we can, among other things, understand how many edits Paste Check is being suppressed from showing witin.
    • One planned analysis is to compare the revert rate of the different sources of pastes to provide insight into how many of these edit types are problematic.

Leading indicators

WARNING: Need to discuss with @MNeisler

Metrics

IDNameMetric(s) for EvaluationConclusionNotes
1.Newcomers and Junior Contributors are not encountering Paste CheckProportion of new content edits Paste Check is shown within and Proportion of new content edits that were not eligible to be shown Paste Check because the pasted text was copied from known source (e.g. googleDocs, plainText, etc.)🟩 No action needed at this time: paste Check is being shown in a sufficient number of the new content edits newcomers in the test group made (36%). For reference, this rate is significantly higher than rates observed for Tone Check (9%).If this is low, we might consider revising the logic that prevents Check from showing on certain Pastes. E.g. T405297. Note: We added instrumentation in T407302 to track pastes that would have caused Paste Check had the text not been copied from a known source so that we can understand how many edits Paste Check is being suppressed from showing within.
2.Newcomers and Junior Contributors are not understanding the featureProportion of contributors that are presented Paste Check and abandon their edits🟩 No action needed at this time: newcomers presented Paste Check are doing the opposite of abandoning edits, they are completing them at a higher rate (52%) than edits in the control group that are eligible but not shown Paste Check (49%).As @Trizek-WMF noted, several de.wiki volunteers expressed concern that newcomers could be discourage by the interface copy that suggests their account could be blocked for introducing a copyright violation. This context could be helpful if we come to see a high abandonment rate.
3.People deem Paste Check irrelevantProportion of edits wherein people elect to dismiss/not change the text they've added🟩 No action needed at this time: people are dismissing Paste Check at a rate (55%) that is similar to rates we observed for Tone Check and Reference Check.Consider decision we made in T406164#11247475 to show Paste Check card on mobile immediately pasting
4.Paste Check is causing disruption1) Proportion of people blocked after publishing an edit where Paste Check was shown and 2) Proportion of published edits that add new content and are reverted within 48 hours🟩 No action needed at this time: overall, new content edits shown Paste Check are reverted less frequently. We've observed a -21.3% decrease in published edits where Paste Check was shown compared to edits eligible but not shownIn addition to edits shown Paste Check, we will also review the revert rate of edits that would have been shown Paste Check had the text not been copied from a known source. This will be logged as ignored-paste-[source] in VisualEditorFeatureUse.
5.Newcomers and Junior Contributors are not interacting with Paste CheckProportion of edit sessions in which ≥1 Paste Check is shown and people do not interact [i] with one or more of the Paste Checks that were were shown.🟩 No action needed at this time: in more than half of of all editing sessions where Paste Check was shown (55%), people interact with one or more of the Paste Checks presented. This rate is similar to those we observed with Reference Check and Tone Check. We'll revisit the priority of interventions to potentially increase this rate (e.g. T407543) following the completion of the final analysis (T399669)The need for this metric emerged through T407543 wherein we identified Paste Check is not being shown in the Pre-Save moment if people do not interact with it during Mid-Edit

i. Where "interact" in this context refers to people tapping either of the buttons that appear within the Paste Check "card": Yes, keep it or No, remove it. More in T407543#11330136.

Event Timeline

ppelberg removed ppelberg as the assignee of this task.
ppelberg moved this task from Backlog to Analytics on the Editing-team (Tracking) board.

During the team's offline discussion today (17 September), we came to wonder whether the decline survey, as currently written, could mislead newcomers into thinking they can use pasted text when, in fact, Wikipedia's copyright policy would not allow it.

The above came up specifically in response to the option within the decline survey that currently reads I have permission to reuse this content.

Reason being, as @Sdkb described (paraphrasing): "A case of pasted content I often encounter is someone writing drafts about themselves. For example, someone taking the contents from the “About” page from their employer, but this content is not freely licensed. This pathway may give them a false sense of confidence that they’re abiding by rules, but if there’s no creative commons statement on the website it could be removed as copyvio."

The Team is now wondering what qual./quant. data we might review to evaluate how frequently this case is occurring.

See ===Open question #1 in the task description above.

MNeisler updated the task description. (Show Details)

@ppelberg I've completed an analysis of Paste Check Leading indicators as described in the task description.

See the summary of results below and report for additional details and metric breakdowns.

Note: Data reflects events logged in the first two weeks of the Paste Check AB Test (9 October 2025 and 22 October 2025) at the 22 partner Wikipedias. Additional event data will be needed to confirm the statistical significance of these findings. We will review the complete AB test data as part of the analysis in T399669

Paste Check Frequency

  • Paste Check was shown at least once at 36% of all published new content edits by newer editors in the test group. For reference, this is significantly higher than rates observed for Tone Check (only 9% of all published new content edits were shown Tone Check).
  • A higher proportion of published edits on desktop are shown Paste Check (39%) compared to mobile (24%).
  • Paste Check appears slightly more frequently for newcomers. We observed a 15% increase in the proportion of published new content edits shown Paste Check when limited to users making their first edit on a Wikipedia.

Paste Check Edit Completion Rate

  • Edits shown Paste Check are completed at a higher rate (52%) than edits in the control group that are eligible but not shown Paste Check (49%). This represents a 6% relative increase.
  • We currently don't see any increase in edit abandonment rate even if a large number (>3) Paste Checks are shown in a single session.
  • We observed increases by both platform types as well. There was a 15% increase (6 percentage points) for mobile web edits and 4% increase (2 percentage points) on desktop.
  • Edit completion rate increased across all user experience types to differing degrees. There was an 11% increase in edit completion rate for unregistered users while we only observed a 2.6% increase in edit completion rate for Newcomers (registered users making their fist edit).

Paste Check Dismissal Rate (Users select to keep pasted text)

  • Users selected to keep the pasted text when prompted at 55% of edits shown Paste Check. This edit check dismissal rate is similar to rates observed for Tone Check and Reference Check.
  • Users are more likely to keep their pasted text on desktop. Users selected to keep the pasted text at 48% of all published mobile web edits where Paste Check was shown compared to 56% of desktop published edits.
  • Users select "I wrote this content and its not published elsewhere" in over half (54%) of all published edits where the user selected to keep their pasted text. This is the most frequently selected reason for keeping pasted text on both mobile web and desktop.
  • Registered newcomers (users making their first edit on the Wikipedia) are dismissing Paste Check at higher rates compared to unregistered users or Junior Contributors. These users selected the "I wrote this content..." option at 69.5% of all published edits where Paste Check was dismissed.

Paste Check Revert Rate

  • Overall, new content edits shown Paste Check are reverted less frequently. We've observed a -21.3% decrease in published edits where Paste Check was shown compared to edits eligible but not shown. Paste Check. Decreases were observed across all reviewed user types (unregistered, newcomers, and Junior Contributors).
  • Revert rates for both edits shown Paste Check (9.6%) or eligible to be shown Paste Check (12.2%) are lower than the revert rates we've observed for other types of edits. For example, there's 25% revert rate for edits detected as having non-netural tone (see T371158#11220470).
  • When split by platform, we see differing trends per platform. For desktop, we've observed a -28% decrease in revert rate. While on mobile, there's been a slight 8.8% increase. However, at this point in this AB test, there's been a low absolute number of mobile edits shown or eligible to be shown Paste Check that have been reverted (<50 edits). We will need more data to confirm any trends.

Paste Check Interaction Rate

  • At 45% of all editing sessions where Paste Check was shown, people did not interact with one or more of the Paste Checks presented.
  • Of these, 35% of editing sessions did not include interaction with any of the Paste Checks presented. The other 10% of edits were edits presented multiple Paste Checks where people did not interact with one or more of the Paste Checks presented.
  • There's no variation in interaction rate by platform type.

Next Step: We will also review metrics related to suppressed Paste Checks (pastes that would have caused Paste Check had the text not been copied from a known source) once we obtain sufficient events from the instrumentation added in T407302. This will be used to understand how many edits Paste Check is being suppressed from showing within and how often these types of edits are reverted any may be problematic.

This is interesting, @MNeisler! People being more likely to complete an edit when the check is shown feels unexpected, since normally throwing warning notices at people causes some portion to give up and abandon the task. I'm curious, do we have any hypotheses that might explain this?

ppelberg added a subscriber: Quiddity.

Per what @MNeisler and I discussed offline, we do not think any of the leading indicators warrant revisions to the Paste Check user experience before the A/B experiment concludes.

Accordingly, I've updated the task description via T400098#11381836 to reflect this.

With the above in mind, this analysis did bring to mind new questions the Editing Team is curious to explore by way of T399669:

  • How does the revert rate of edits in which Paste Check shown vary by what reason people who declined to remove pasted text offered?
  • How does the revert rate vary between edit sessions in which ≥1 Paste Check is shown and people who and do not interact with ≥1 Paste Check?
  • In edits when ≥1 Paste Check is shown, how do the rates at which people interact with a Paste Check vary based on when it was shown? E.g. are people more likely to interact with the first Paste Check that is shown? Are people more likely to interact with the last Paste Check that is shown? Something else?

Next steps

This is interesting, @MNeisler! People being more likely to complete an edit when the check is shown feels unexpected, since normally throwing warning notices at people causes some portion to give up and abandon the task. I'm curious, do we have any hypotheses that might explain this?

Great spot and question, @Sdkb. A few ideas that immediately come to mind...

  1. Maybe some proportion of newcomers actively wonder what (if anything) they need to do after pasting text and Paste Check addresses this uncertainty by offering clear guidance and a next step.
  2. Maybe the choice Paste Check presents, causes some proportion of newcomers to slow down/feel a stronger drive to save the edits they've made after having "invested" this much into it.
  3. Maybe the presence of Paste Check offers people some subtle sense that what they're doing is expected and therefore they feel encouraged to keep going

Next Step: We will also review metrics related to suppressed Paste Checks (pastes that would have caused Paste Check had the text not been copied from a known source) once we obtain sufficient events from the instrumentation added in T407302. This will be used to understand how many edits Paste Check is being suppressed from showing within and how often these types of edits are reverted and may be problematic.

I completed an analysis of editing sessions where at least one Paste Check was suppressed. These are sessions where pasted text was used but did not cause Paste Check to activate, given how it is currently configured. Specifically, I reviewed (1) the proportion of published edits where at least one Paste Check was suppressed and (2) how frequently these edits were reverted.

See results summarized below:

Methodology: Reviewed ignored-paste-[source] events logged at partner wikis since this event was instrumented on 27 October through 16 November. Data was limited to published new content edits by unregistered users and users with 100 or fewer edits.

Frequency of Paste Check Suppressed Edits

  • Paste Check was suppressed at 23% of all published new content edits. This is less than the frequency of edits eligible for Paste Check to be shown (36%). If we included these types of currently ineligible pastes, we would show Paste Check at a little over half of all published new content edits by newer editors.
  • Similar to pastes eligible for Paste Check, these ineligible pastes also occur more frequently on desktop compared to mobile web but at a lower frequency on both platforms (6% on mobile web and 30% on desktop)

Proportion of edits with pastes eligible and not eligible for Paste Check

PlatformPastes Not Eligible for Paste CheckPastes Eligible for Paste Check
mobile web6.1%24%
desktop30.7%39%
  • Pastes from plainText and visualEditor are the most frequent sources of suppressed Paste Checks, together representing 86% of edits where a Paste Check was suppressed (40% plainText and 41.5% visualeditor). Note: Only plainText and VisualEditor sources of pasted text have occurred in Mobile Web edits during the reviewed timeframe, while all source types occured on desktop.

Suppressed Paste Checks by Source Type

Source typeProportion of edits where Paste Check was suppressed
googleDocs11.3%
libreOffice1%
microsoftOffice17.3%
plainText40.1%
visualEditor41.5%

Revert rate of Paste Check Suppressed Edits
We also reviewed the revert rate of edit where a Paste Check was suppressed to understand how many of these types of edits may still be problematic.

  • 8% of edits with at least one suppressed Paste Check were reverted. This is lower than the revert rate (12%) observed for edits eligible for Paste Check, indicating that these types of edits are typically higher quality than eligible Paste Check edits.
  • Pastes from plainText sources are the most commonly used source in suppressed Paste Check edits that were reverted; however, the revert rate is still lower than the revert rates observed for pastes eligible for Paste Check . See table below:

Revert rate of edits with suppressed Paste Check by Source Type

Source typeProportion of edits with suppressed Paste Check
googleDocs0.7%
libreOffice0%
microsoftOffice1.6%
plainText3.7%
visualEditor2.6%

cc @ppelberg

Pastes from plainText and visualEditor are the most frequent sources

Just to note for clarity: a paste sourced from visualEditor could be either an internal paste from within the same document, or an external paste from a different document opened in VE.