Page MenuHomePhabricator

Report on Edit Check A/B test leading indicators
Closed, ResolvedPublic

Description

In T342930, we will complete the broader analysis of the Edit Check A/B test to evaluate the impact of this intervention.

This ticket involves an analysis of a set of leading indicators we will use to decide what – if any – adjustments/investigations the Editing Team will prioritize making before we forward with the analysis T342930 describes.

Decision(s) to be made

  • What – if any – adjustments/investigations will we prioritize for us to be confident moving forward with evaluating the reference Edit Check's impact?

Leading indicators

Metrics

IDNameMetric(s) for EvaluationConclusion
1.Citation reusePercentage of people who elect to add a reference and use Citoid's Re-use feature to do so [ii]
2.Newcomers and Junior Conributors are not encountering Edit CheckThe proportion of edits/editors the reference Edit Check is being activated within/for
3.Newcomers and Junior Contributors are not understanding the featureHigh edit abandonment rate after reference Edit Check is shown
4.High volume of false positive reports1) Reference Edit Check is being shown when people don't think it should be and 2) High proportion of reverts among edits where reference Edit Check is shown
5.Reference Edit Check is causing lots of disruptionSee "3." and "4." above

Event Timeline

MNeisler triaged this task as Medium priority.Dec 1 2023, 5:21 PM
MNeisler moved this task from Triage to Current Quarter on the Product-Analytics board.

The Edit Check (References) AB test was deployed on 19 February. I will plan to review leading indicators priorto completing the broader analysis of the Edit Check A/B test in T342930.

I reviewed AB test data logged from 19 February through 25 February 2024. See key results and insights from the leading indicator analysis summarized below:

Note: Results below are based on initial AB test data to check if any adjustments to the feature need to be prioritized. I will review the complete AB test data (based on two week duration) as part of the analysis in T342930.

1: Percentage of people who elect to add a reference and use Citoid's Re-use feature to do so
  • Across all test wikis, 29 of 847 users (3.4%) of contributors used Citoid's re-use feature after electing to add a reference in the Edit Check prompt. 70% (21) of the edits where Citoid's re-use feature was used were saved successfully (not reverted after 48 hours).
  • No users at viwiki or zhwiki have used the Citoid re-use feature. The proportion of users that use the Citoid reuse feature after being shown edit check is under 6% at all of the other AB test partner wikis. The highest proportion of users that have used the Citoid re-use feature is currently at eswiki (5.6%).
  • Similar proportions of contributors that used the re-use feature for both desktop and mobile (as shown in the table below); however, there was a much higher revert rate of these edits on mobile (only 42% of these edits were successfully saved)

Proportion of editors/edits by platform that used Citoid's re-use feature after electing to add a reference

PlatformProp of UsersProp of Edits
desktop3.6%3.8%
mobile3.2%4.1%
2. The proportion of edits/editors the reference Edit Check is being activated within/for

Defined as the proportion of newcomers and junior contributors that started an edit (defined as reaching 'ready' [see T352122#9569512 for detials]) and were shown the edit check (reference) prompt

  • Overall, edit check has been shown for 1293 newcomers and junior contributors (0.2% of users) that started an edit and were bucketed in the AB test group. It has been shown in 0.17% of edit attempts.
  • Proportion of edits/editors shown the edit check prompt by platform:
PlatformNumber/Prop of UsersNumber/Prop of Edits Attempts
desktop717 users (0.23%)882 edit attempts (0.19%)
mobile576 users (0.18% )779 edit attempts (0.2%)
  • It's shown to a higher proportion of registered newcomers and Junior Contributors (3.7%) than unregistered users (0.13%)
  • Edit check has been shown at all identified partner wikis except afwiki or yowiki during the reviewed time frame.
    • At afwiki, there were 1,524 edit attempts by newcomers and Junior contributors
    • At yowiki, there have been 28 edit attempts by newcomers and Junior Contributors
    • For the other AB test wikis, the percentage edit check has been shown ranges from a low of 0.06% of users at zhwiki to a high of 0.44% users at ptwiki.
  • Note these numbers include editors that may have opened the editor and left before making any change. If we limit our review to only editors/edits that reached saveSuccess, then edit check has been shown to 977 (7.2%) newcomers or junior contributors that completed an edit. Edit check has been shown in 1,217 published edits (4.5% of completed editing sessions by newcomers and Junior Contributors).

The proportion of eligible edits Edit check was shown within is similar to the proportion of eligible edits identified in the control group (but not shown edit check) as expected based on a 50/50 bucketing split. This indicates that Edit Check is being shown at all eligible edits in the test group at expected rates and that there are no significant activation issues.

Note: Eligible edits identifed by the inclusion of the editcheck-references tag.

Overall Eligible Published Edits at Control and Test Groups.

experiment groupnumber of published editsnumber of users
control (eligible edits; edit check not shown)4.6%7.6%
test (eligible edits; edit check was shown)4.5%7.2%
3. The proportion of edits abandoned after edit check is shown

Defined abandoned as the user existing the editor, discarding changes (`event.action = abort and event.abort_type = abandon)

  • Overall, 15% of edits (252 edits) where edit check was shown were abandoned. 17.6% distinct users have abandoned their edit after being shown edit check. As a comparison, about 23% of edits where edit check was not shown were also abandoned.
  • Proportions are similar when split by desktop and mobile (around 15% of edits abandoned for each platform)
  • No significant spikes in edits abandoned at any one of the AB test wikis. The highest proportion of edits abandoned has occurred at ptwiki (20.4% of edits) and arwiki (22.3% of edits). These rates are similar to the per wiki abandonment rates trends seen in the control group.
4a. False Positive Reports:

Defined as the proportion of users shown edit check and selects dialog-choose-irrelevant as a reason for not adding a citation. Per T329593, this is used to indicate that they think the Edit Check prompt appeared in error.

  • dialog-choose-irrelevant is the least frequently selected reason for declining to add a reference. Only 111 users selected this option out of 769 users who selected an explicit reason for not adding a citation.
  • Overall, 7.5% of edits where edit check was shown included an explicit acknowledgment that edit check was declined it was irrelevant.

Proportion of editors and edits that indicate false positive by platform

PlatformNumber/Prop of UsersNumber/Prop of Edits Attempts
desktop67 users (9.2%)78 edit attempts (8.8%)
mobile44 users (7.3% )48 edit attempts (6.1%)
  • No high volume of false positive reports for any of the individual AB test wikis. Ptwiki and viwiki had the highest proportion of user-reported false positive reports (15% of edits where edit check was shown at each wiki).
4b. Proportion of published edits that were shown edit check and are reverted within 48 hours (indicator of overall accuracy)

Defined as the proportion of edits shown edit check and that were published and reverted within 48 hours.

  • Overall 22% of all edits (264 published edits) where edit check was shown were reverted in under 48 hours. This is slightly lower than the revert rate for published edits by newcomers where edit check was not shown (24.8%).
  • A slightly higher proportion of edits on mobile were reverted compared to desktop

Proportion of edits reverted by platform and if edit check was shown

PlatformProp of published edits where edit check shownProp of published edits where edit check was not shown
desktop15.9%16.4%
mobile30.3%31.1%
  • Edit revert rates were under 30% at each AB test wiki. The highest revert rates were at eswiki and itwiki (about 29% of published edits reverted at each wiki). Revert rates observed for each wiki are similar or less than rates for published edits where edit was not shown.

Thank you for bringing this all together, @MNeisler! A couple of questions for you in response...

1. What – if any adjustments do you think we ought to consider making to the "Conclusions" I've proposed below?

IDNameMetric(s) for EvaluationConclusions
1.Citation reusePercentage of people who elect to add a reference and use Citoid's Re-use feature to do so [ii]✅People shown Edit Check are NOT "re-using" references at rates and in ways that require intervention at this time.
2.Newcomers and Junior Contributors are not encountering Edit CheckThe proportion of edits/editors the reference Edit Check is being activated within/forTBD. See question "2." below.
3.Newcomers and Junior Contributors are not understanding the featureHigh edit abandonment rate after reference Edit Check is shown✅ Newcomers and Junior Contributors do NOT seem to be particularly confused by Edit Check as evidenced by a lack of significant regression in edit abandonment rates.
4.High volume of false positive reports1) Reference Edit Check is being shown when people don't think it should be and 2) High proportion of reverts among edits where reference Edit Check is shownSee conclusion in the row below ("5.").
5.Reference Edit Check is causing lots of disruptionSee "3." and "4." above✅ Edit Check does NOT appear to be disruptive as evidenced by, as @MNeisler put it in T352130#9588386, "...revert rates observed for each wiki are similar or less than rates for published edits where edit was not shown.//

2. What question might we ask to become more confident about whether we are satisfied with the proportion of "eligible edits" Edit Check is being shown within?
I ask this seeking a reference we can use to contextualize the metrics shared in T352130#9588386.

Thanks for the quick review @ppelberg!

  1. What – if any adjustments do you think we ought to consider making to the "Conclusions" I've proposed below?

The conclusions you've proposed look good to me.

Only one suggested change: Would it be worthwhile revising #5 to indicate the following as we indicated disruption would be evaluated by both incidence of false positive reports and revert rates.

"Edit Check does NOT appear to be disruptive as evidenced by, as @MNeisler put it in T352130#9588386, "...revert rates observed for each wiki are similar or less than rates for published edits where edit was not shown and we have not received a high volume of false positive reports.

  1. What question might we ask to become more confident about whether we are satisfied with the proportion of "eligible edits" Edit Check is being shown within?

There's a couple of points that might help contextualize the findings on newcomers and Junior Contributors encountering edit check:

  1. The rate of edit check events we're seeing provides us with the confidence that will we have a sufficient sample of data to draw meaningful conclusions about the impact of edit check on newcomers and Junior contributors. At the current daily rate, we will have data on over 2,400 sessions where edit check was shown after a two-week period. This is similar to the sample size we've reviewed in previous AB tests.
  2. The proportion of eligible edits Edit check was shown within is similar to the proportion of eligible edits identified in the control group (but not shown edit check) as expected based on a 50/50 bucketing split. This indicates that Edit Check is being shown at all eligible edits in the test group at expected rates and that there are no significant activation issues.

    Context: I reviewed the available edit check tag data to review the proportion of published edits by newcomers and Junior contributors in the AB test that were tagged as meeting conditions that could cause the reference edit check to be shown (editcheck-references).

Overall Eligible Published Edits at Control and Test Groups.

experiment groupnumber of published editsnumber of users
control (eligible edits; edit check not shown)4.6%7.6%
test (eligible edits; edit check was shown)4.5%7.2%

Note: I added this context to the results summarized in T352130#9588386 as well.

Change 1009607 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Move saveWorkflowBegin to before the saveProcess

https://gerrit.wikimedia.org/r/1009607

Change #1009607 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Move saveWorkflowBegin to before the saveProcess

https://gerrit.wikimedia.org/r/1009607