⚓ T229426 Check on VE as default A/B test results

Status	Assigned	Task
Open	None	T255327 [Epic] Evaluate which editing interface should be shown by default
Open	None	T227338 Test visual editor as the default mobile editor on select wikis
Resolved	nshahquinn-wmf	T229426 Check on VE as default A/B test results

• ppelberg created this task.Jul 31 2019, 2:27 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 31 2019, 2:27 PM

• ppelberg added a parent task: T227338: Test visual editor as the default mobile editor on select wikis.Jul 31 2019, 2:27 PM

Updating the task description to include the metrics we would like to check on. Those metrics are reflected in the task description's "Done" section.

JTannerWMF edited projects, added VisualEditor (Current work); removed VisualEditor.Aug 8 2019, 4:18 PM

• ppelberg added a subscriber: MNeisler.Aug 8 2019, 4:37 PM

• ppelberg moved this task from Incoming to In progress on the VisualEditor (Current work) board.Aug 9 2019, 4:51 PM

nshahquinn-wmf added a project: Product-Analytics.Aug 26 2019, 4:09 PM

nshahquinn-wmf triaged this task as Medium priority.Aug 26 2019, 4:17 PM

nshahquinn-wmf moved this task from Triage to Doing on the Product-Analytics board.

@Neil_P._Quinn_WMF @ppelberg is the check-in component done? Are the notes from discussion potential future tasks?

Megan and I have done the initial checks and found the following.

53.4% of the users (both registered and anonymous) who were bucketed ended up in wikitext default bucket. It turns out that it would be incredibly unlikely (p << 10^-15) to get an imbalance this big if our random assignment was actually 50%–50%. So there's clearly a serious issue somewhere that we need to understand.

bucket	users
source default	1,302,187
visual default	1,214,917

With that said, this is a preliminary look at our key metrics. The table shows the value of each metric for the average user in the bucket (because each user is independent of each other, but different attempts by the same user are not independent).

bucket	attempts	completed edits	edit completion rate
source default	1.399	0.040	0.87%
visual default	1.403	0.037	0.79%

In addition, among completed edits in each bucket, the average editing time was:

bucket	editing time
source default	2 min, 12 s
visual default	2 min, 46 s

In T229426#5463187, @kzimmerman wrote:

@Neil_P._Quinn_WMF @ppelberg is the check-in component done? Are the notes from discussion potential future tasks?

@ppelberg, @MNeisler, and I discussed this yesterday and, yes, at this point we're calling the preliminary check done. Megan will dig into some of the issues above as part of T221198.

• ppelberg mentioned this in T232175: Investigate VE as default A/B test findings.Sep 6 2019, 12:03 AM

In T229426#5463187, @kzimmerman wrote:

...Are the notes from discussion potential future tasks?

@kzimmerman, yep. Although, I've created a task that [hopefully] makes this more explicit: T232175.

Side note: I wonder whether using Phabricator as a drafting space (as I'm doing in T232175) is appropriate. So please, if you see a better way, I'm all ears (or I guess eyes in this context).

In T229426#5468481, @Neil_P._Quinn_WMF wrote:

@ppelberg, @MNeisler, and I discussed this yesterday and, yes, at this point we're calling the preliminary check done. Megan will dig into some of the issues above as part of T221198.

@MNeisler + @Neil_P._Quinn_WMF, thanks for pulling these results together. A question [1] related to "Editing interface switching"...

In the following scenario, how would the test make sense of this contributor's editing session? Would the test consider this contributor as having abandoned their edit in default-visual?

Scenario
i. Contributor taps edit
ii. Contributor is bucketed into default-visual
Visual editor loads (read: reaches ready)
iii. Contributor switches to wikitext
iv. Contributor makes some changes
v. Contributor publishes their changes

Neil, we may have discussed this before, but a quick search of phab and our shared doc didn't surface anything...

In T229426#5470251, @ppelberg wrote:

@MNeisler + @Neil_P._Quinn_WMF, thanks for pulling these results together. A question [1] related to "Editing interface switching"...

In the following scenario, how would the test make sense of this contributor's editing session? Would the test consider this contributor as having abandoned their edit in default-visual?

Scenario
i. Contributor taps edit
ii. Contributor is bucketed into default-visual
Visual editor loads (read: reaches ready)
iii. Contributor switches to wikitext
iv. Contributor makes some changes
v. Contributor publishes their changes

Neil, we may have discussed this before, but a quick search of phab and our shared doc didn't surface anything...

You're right, I don't think we've discussed this before. Since switching interface on mobile doesn't result in a page reload, it won't cause the creation of a new editing session ID. That means the scenario you described would be treated as a single completed attempt. Deciding which interface to credit for the completion is complex, but deciding which bucket is simple since switching interfaces shouldn't affect the bucket.

• ppelberg mentioned this in T232237: VE mobile default test buckets are unbalanced.Sep 7 2019, 1:59 AM

In T229426#5471170, @Neil_P._Quinn_WMF wrote:

...Since switching interface on mobile doesn't result in a page reload, it won't cause the creation of a new editing session ID. That means the scenario you described would be treated as a single completed attempt. Deciding which interface to credit for the completion is complex, but deciding which bucket is simple since switching interfaces shouldn't affect the bucket.

Understood, ok. Thanks, Neil.

As a mental note: in the instance described in T229426#5470251, we would count this contributor as having completed their edit in the test bucket they were initially assigned.

• ppelberg mentioned this in T234277: Decide how we move forward with the A/B test.Oct 2 2019, 11:54 PM

nshahquinn-wmf mentioned this in T223339: Re-run metrics from VE on mobile report .Nov 26 2019, 11:57 PM

• ppelberg mentioned this in T221198: VE mobile default: analyze A/B test results.Jun 1 2020, 10:11 PM

Check on VE as default A/B test results
Closed, ResolvedPublic
Actions

Description

Overview

Done

Related Objects
Search...

Event Timeline

Open questions

Check on VE as default A/B test resultsClosed, ResolvedPublicActions