Page MenuHomePhabricator

[A/B Test] Report on Multi-Check (References) leading indicators
Closed, ResolvedPublic

Description

≥2 weeks after starting of the Multi-Check (References) A/B Test (T379131), we will Check on a set of leading indicators (outlined below).

We will use this ticket to scope and conduct this analysis.

Analysis timing

Assuming the A/B test begins on schedule (March 25, 2025), work on this analysis can begin as early as April 8, 2025.

Decision(s) to be made

What – if any – adjustments/investigations will we prioritize for us to be confident moving forward with evaluating the Multi Check's impact in T379131?

Leading indicators

Metrics

IDNameMetric(s) for EvaluationConclusion
1.Newcomers and Junior Contributors are not encountering Multi-Check (References)Proportion of new content edits presented multiple reference checks within a single editing session
2.Newcomers and Junior Contributors are not understanding the featureProportion of contributors that are presented Multi Check (References) and abandon their edits
3.People deem Multi Check irrelevantProportion of edits wherein people elect to dismiss/not change the text they've added
4.Mulit Check is causing disruption1) Proportion of people blocked after publishing an edit where Multi Check was shown and 2) Proportion of published edits that add new content and are reverted within 48hours

Event Timeline

MNeisler triaged this task as Medium priority.Mar 31 2025, 3:17 PM
MNeisler edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

I completed an analysis of the leading indicators based on initial AB test data logged from 25 March through 8 April 2025. Please see a high-level summary of results below and the full report for additional details and breakdowns.

Note: Results are based on initial AB test data to check if any adjustments to the feature need to be prioritized. More event data will be needed to confirm statistical significance for many of these findings. I will review the complete AB test data (based on two week duration) as part of the analysis in T379131.

IDNameMetric(s) for EvaluationConclusion
1.Newcomers and Junior Contributors are not encountering Multi-Check (References)Proportion of new content edits presented multiple reference checks within a single editing sessionIn the test group, multiple reference checks were shown within a single editing session at 19% of all published new content VE edits (549 edits) by unregistered users and users with 100 or fewer edits. For edits shown multiple checks, the majority of edits (73%) were shown between 2 to 5 checks. Based on this rate, we should have sufficient multi-check events after the test run for 4 weeks to confirm the overall statistical significance of any changes introduced by this change.
2.Newcomers and Junior Contributors are not understanding the featureProportion of contributors that are presented Multi Check (References) and complete their editsThe edit completion rate for sessions that were shown multiple checks within a session was 76.1% compared to 75% for sessions shown only one check, indicating that multiple checks are not causing significant disruption or confusion to the editors.
3.People deem Multi Check irrelevantProportion of edits wherein people elect to not add a new referenceWhile we observed a slightly higher increase in the proportion of individual checks dismissed for edits shown multiple checks in the test group, sessions shown multiple checks are more likely to include at least one new reference in the final published edit compared to sessions shown just a single check. In the test group, 47.5% of all published edits shown multiple checks did not include at least one new reference compared to 60.3% of edits that were shown a single check.
4.Mulit Check is causing disruption1) Proportion of people blocked after publishing an edit where Multi Check was shown and 2) Proportion of published edits that add new content and are reverted within 48hoursWe observed no significant differences in the revert rate of new content edits between the control and the test group for editing sessions where a reference check was shown. In the test group, the revert rate of new content edits shown multiple checks (17%) is currently lower compared to sessions shown a single check (26%.). There were also no significant changes in the proportion of users blocked after being shown multiple checks compared to a single check.

cc @ppelberg

@MNeisler and I talked these results through synchronously. We also:

  • Shared the results with the Editing Team during this week's planning meeting
  • Published the results on mediawiki.org