Page MenuHomePhabricator

Run A/B test to evaluate impact of sticky header editing affordance
Closed, ResolvedPublic

Assigned To
Authored By
ppelberg
Oct 27 2021, 11:06 PM
Referenced Files
F35847058: edit_engagement_bywiki.png
Dec 8 2022, 7:47 PM
F35847061: Edit_engagement_bytestgroup.png
Dec 8 2022, 7:47 PM
Restricted File
Dec 1 2022, 3:32 PM
Restricted File
Dec 1 2022, 3:31 PM
Restricted File
Dec 1 2022, 3:29 PM
Restricted File
Nov 30 2022, 10:02 PM
Restricted File
Nov 30 2022, 10:00 PM
F35824892: edit_reverts_group2_bytestgroup_table.png
Nov 30 2022, 4:28 AM

Description

🧪 Results

To evaluate the impact of introducing an "edit" button within the Vector 2022's new sticky header, we ran two A/B tests.
These two A/B tests were design to help us learn how the new edit button within the sticky header impacted the likelihood:

  1. People would publish the edits they started
  2. The edits people published would get reverted

What follows are the conclusions we're drawing from these A/B tests and details about the Wikipedias that participated in them.

Conclusions
The results of both A/B tests, have led to us conclude:

  1. People were more likely to complete the edits they start using the sticky header
    • Of all the edits people initiated throughout both A/B tests, there was a 2.8% and a 6.8% increase in the percent of people who were able to successfully complete at least one edit using the edit button within the sticky header, in AB Test Experiment #1 and #2 respectively. This is in comparison to edits people started using other edit buttons present on the page.
  1. Edits people initiated and published using the sticky header were less likely to be reverted
    • Of all the edits people published throughout both A/B tests, the edits people started using the new edit button within the sticky header were less likely to be reverted than edits started using other edit buttons present on the page.

Note: While we are able to confirm that edits published using the sticky header were less likely to be reverted than edits published using other edit buttons present on the page, we are unable to confirm and share a specific percentage decrease in revert rates because of a relatively high margin of error. Learn more in the test report.

Test Details
To arrive at the conclusions above, we ran two A/B tests.

The first test ran between 6 July and 8 August 2022 on 15 Wikipedias. At these projects, 50% of people included in the A/B test were shown the Vector 2022 skin's new sticky header without an edit button within it and 50% of people were shown the Vector 2022 skin's new sticky header with an edit button within it.

The second test ran between 16 August and 1 September 2022 on two Wikipedias: Vietnamese and Indonesian. At these two projects, there were three equally-sized three test groups:

  1. A control group that did not see or have access to the Vector 2022 skin's new sticky header
  2. A treatment group that saw the Vector 2022 skin's new sticky header without an edit button within it
  3. A treatment group that saw the Vector 2022 skin's new sticky header with an edit button within it

T283505 will introduce a fixed "sticky" site header to the desktop reading experience.

To start, the sticky header will NOT contain editing functionality, per T294383.

This task represents the work with running an A/B test to evaluate the impact introducing an edit affordance within desktop reading experience's fixed "sticky" site header has on the following:

  1. The speed and ease with which contributors, across experience levels, can begin making a change to the content they are wanting to affect
  2. How likely people are to publish the edits they start making
  3. Peoples' awareness of their ability to edit the content they are consuming
  4. The rate at which people make destructive changes to wikis

Decision to be made

This A/B test will help us make the following decision: Should edit affordance(s) within the sticky header be made available to more people?

Note: we have yet to define what "more people" means in this context and defining it depends on knowing all of the people who have access to the sticky header at the time the test is run.

Curiosities

We do not think this series of changes is likely to have a notable impact on a single metric that has a clear directional interpretation. Said another way: we do not think there is a single metric that is likely to A) move in response to these changes *and* B) for the direction of that movement to indicate a clear improvement or degradation in peoples' user experience. As such, we will evaluate a collection of metrics and evaluate them holistically to decide whether the impact of this intervention was positive or negative.

PriorityImpactMetric
1.Peoples' awareness of their ability to editProportion of people who initiated at least one edit session during the course of the A/B test
2.Reduction in effort required to start an editAverage time required between when the editing interface is ready and people make a change to the document.
3.Reduction in overall effort required to publish an editi. Average number of edits each person makes throughout the course of the A/B test and ii. Average rate at which people publish the edits they start

Decision Matrix

IDScenarioMetric(s)Plan of action
1.People are more likely to make destructive editsProportion of published edits that are reverted within 48 hours of being made increases by >10% over a sustained period of time1. Pause plans for wider deployment of the sticky header with editing functionality enabled, 2. To contextualize change in revert rate, investigate changes in the number of published edits (maybe higher revert rate is a "price" we're willing to "pay" for the increase in good edits), 3. Investigate the type of edits being reverted to understand how the sticky header editing affordance could be contributing to the uptick
2.People are less likely to publish the edits they initiateEdit completion rate decreases by >10% over a sustained period of time1. Pause plans for wider deployment of the sticky header with editing functionality enabled, 2. Investigate what patterns exist among the people whose edit completion rate has gone down; if there is a pattern among Junior Contributors, prioritize work on T296907
3.People do NOT encounter more difficulty publishing edits and there are no regressions in edit revert and edit completion ratesTime to first change does NOT increase by >10% over a sustained period of time, edit revert rate does NOT increase by >10%, edit completion rate does NOT decrease by >10%Move forward with opt-out deployment

Participating Wikis

See: T298280.

NOTE: all wikis listed in T298280 EXCEPT for th.wiki and vi.wiki will have access to the sticky header without editing functionality included prior to the A/B test beginning. More context in T298280#7641884.

Done

  • A report is published that evaluates the ===Hypotheses above

Related Objects

StatusSubtypeAssignedTask
Resolvedovasileva
Open ppelberg
Resolved ppelberg
ResolvedMNeisler
Resolvedovasileva
Resolved ppelberg
Resolvedovasileva
DeclinedNone
ResolvedEsanders
Resolved ppelberg
Resolvedovasileva
Resolvedovasileva
ResolvedDLynch
Resolvedovasileva
Resolvedmatmarex
ResolvedJdlrobson
Resolved ppelberg
OpenNone
Resolved ppelberg
Resolved ppelberg
Resolvedovasileva
ResolvedDLynch
Resolvedovasileva

Event Timeline

ppelberg renamed this task from [SPIKE] Evaluate the impact the sticky header is having on editing to Evaluate the impact the sticky header is having on editing.Nov 23 2021, 12:03 AM
ppelberg renamed this task from Evaluate the impact the sticky header is having on editing to Run A/B test to evaluate impact of sticky header editing affordance.EditedDec 2 2021, 12:23 AM
ppelberg updated the task description. (Show Details)

Meta
I've updated the task description to reflect the decisions the Editing Team converged on during the team's 1-December sticky header offsite. Full notes can be found here (limited access for now): Editing Team mini offsite (Dec 1).

Jdlrobson subscribed.

@ovasileva I've added T311144 to the sprint board to start this. Just needs some clarity about when to turn it on/off and who to show it to.

Per discussions with @ppelberg, I will check the AB test data logged to date to help inform when the AB test can be turned off. Assigning this task to me for now and will follow-up once I confirm.

Update: This will be done in T312296.

Per what @MNeisler and I talked about offline today, to start we're going to prioritize analyzing the following four metrics: all three metrics listed within Decision matrix and the metric "1." within Curiosities.

If there's time, we'll look at metrics "2." and "3." within Curiosities.

MNeisler removed MNeisler as the assignee of this task.EditedNov 30 2022, 4:28 AM
MNeisler moved this task from Doing to Needs Review on the Product-Analytics (Kanban) board.

@ppelberg I've completed the edit completion and revert rate calculations for both AB Test #1 and AB Test #2 (Metric #1 and #2 of the Decision Matrix). See a summary of overall results below and further details/breakdowns in the notebook.

Please let me know if you have any questions. Per our discussions, I will wait to prioritize any additional analyses pending a review of these results.

Definitions:
There were two AB Test Experiments run to test the impact of the new editing affordance. These are defined as follows in the summary and analysis report:

  • AB Test Experiment 1 (Stick header control): Run across 15 partner wikis from 6 July 2022 through 8 August 2022. This control group had access to the sticky header.
  • AB Test Experiment 2 (No sticky header control): Run across 2 partner wikis (viwiki and idwiki) from 16 August 2022 through 1 September 2022. The control group did not have access to the sticky header.

Data Issue:

At the time this analysis started, we were missing the majority of bucketing data for AB Test Experiment #1 as this data is scrubbed after 90 days. This bucketing data is used to decipher between sessions assigned to the control group vs the treatment group for each AB Test. As a result, we reviewed data by editing method (sticky header editing affordance vs existing editing affordances) for AB Test #1 instead of by experiment group (control and treatment).
This allows us to compare the completion and revert rates for edits completed using each method across all users in the experiment. However, ee do not know if people completing editing using existing editing affordances had access to the sticky header or not.

We have complete data for AB Test #2 (No Sticky Header Control Group, run on idwiki and viwiki from 16 August 2022 through 1 September 2022) so results are provided by both editing method and test group for this experiment.

Edit Completion Rates

Edit Completion Rates by Editing Initiation Type (AB Test Experiment #1 and #2 Comparison)
{F35825876}
There was a 2.8% increase in the percent of contributors that were able to successfully complete an editing using the sticky header editing affordance in AB Test Experiment #1 and 6.8% increase in AB Test Experiment #2.

For AB Test Experiment #2, we can also review edit completion rates by test group (control vs treatment).

{F35825878}

{F35827081}

There was a slightly higher completion rate for both treatment groups. The highest completion rate was observed for the treatment group that had the sticky header with edit button compared to the other test groups.

Edit Revert Rates

Edit Revert Rates by Editing Initiation Type (AB Test Experiment #1 and #2 Comparison)
{F35827084}

We observed a decrease in reverts for edits made using the sticky header in AB Test Experiment #1 and slight increase in AB Test Experiment #2 but these differences were not statistically significant.

Test Group Comparison for AB Test Experiment #1

edit_reverts_group2_bygroup.png (2×4 px, 187 KB)

{F35827086}

Per Wikipedia results varied for both edit completion and revert rates. For larger wikis where a sufficient sample size of events was logged, including French, Portuguese, Indonesian, and Vietnamese and Hebrew Wikipedia, we observed a higher or similar completion rates betweeen the two editing affordance types and no statistically significant difference in revert rates.

@ppelberg I've completed the edit completion and revert rate calculations for both AB Test #1 and AB Test #2 (Metric #1 and #2 of the Decision Matrix). See a summary of overall results below and further details/breakdowns in the notebook.

Please let me know if you have any questions. Per our discussions, I will wait to prioritize any additional analyses pending a review of these results.

These results looks great – thank you for sharing them, @MNeisler.

A couple of clarifying questions to ensure I'm interpreting these results accurately...

Follow up questions

  1. What – if any – changes do you think we would need to make to the two statements/conclusions I've listed below accurate?
    • "People completed the edits they initiated by clicking the edit button in the sticky header at a higher rate than the edits they initiated using any other editing button/initiation method."
    • "There was no statistically significant change in the rate at which the edits people started by clicking the edit button in the sticky header, and ultimately published, were reverted when compared to edits initiated using other buttons/initiations methods."
  2. Would it be accurate for me to think that while the "Data Issue" you described in T294506#8430608 would NOT impact our ability to make the two statements I listed above, the issue DOES prevent us from making statements like the two I've listed below about AB Test Experiment 1?
    • "The presence of the sticky header within it caused people to be __% more/less likely to publish the edits they initiated."
    • "The presence of the sticky header within it caused people to be __% more/less likely to publish edits that ended up being reverted."

EDITS as a result of talking with @MNeisler today (30 Nov):

  • Clarified conclusion about revert rate
  • Made "2." more specific as the data issues only impact AB Test Experiment 1

Here is the updated report with edit engagement results for AB Test Experiment #2.

Overall and per wiki results summarized in tables below.

Edit_engagement_bytestgroup.png (822×1 px, 189 KB)

edit_engagement_bywiki.png (1×1 px, 254 KB)

ppelberg updated the task description. (Show Details)

I've updated the task description with the experiment summary @MNeisler and I drafted offline.

With the above done, this task can now be resolved.

We will track the work to make the editing functionality available by default within the Vector (2022) sticky header in T287545 and any other deployment tickets @ovasileva might file as sub-tasks.