Page MenuHomePhabricator

Investigate VE as default A/B test findings
Open, Needs TriagePublic

Description

⚠️

This task is not yet ready to be worked on.

For now, this task is intended to be the place where we gather the questions we have in response to the VE as default A/B test results.

Our analysis of the VE as default A/B tests results can be found in these two tickets:

"Done"

This section will eventually will be populated with the questions that make up our investigation into the results of the VE as default A/B test we have been running.

Event Timeline

ppelberg created this task.Sep 6 2019, 12:03 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 6 2019, 12:03 AM

During our 28 August team planning meeting, we discussed the initial results from the A/B test. Below is an unranked list of the questions that surfaced in that conversation.
For now, consider these questions as notes.

Open questions

  • 1) Why are number of contributors in each test bucket not more balanced? Potential explanation: When VE fails to load, we attempt loading source editor instead, this might partially explain the increased source editor load attempts.
  • 2) What event should mark the "start" of an edit? Asked in the context of calculating edit completion rate.
  • 3) Are the differences in edit completion rate between defualt-source and default-visual statistically significant?
  • 4) How do the results vary across wikis? Some of the wikis we included in the test are significantly larger than others...perhaps they had an outsized impact on the outcome.
  • 5) Is there a qualitative difference in the kinds of edit being attempted in either editing interface?
  • 6) How does the quality of edits vary between the two test groups?
  • 7) Does source produce more quick-followup edits to fix issues from syntax mistakes?
  • 8) How many contributors switched away from their assigned one?
  • 9) 99+% of edit sessions don't result in a save...what is happening in those sessions?
  • 10) What did we record if a user immediately switched to wikitext and completed their edit there? This could happen quite a lot if an existing user edited on their phone without bothering to sign in.
  • 11) Are we doing any analysis on the effects of mobile ve on knowledge equity?
  • 12) Can we group the results based on the user’s network connection quality, or if that’s unavailable, by geographical region or abort timing or something? Thinking: it looks like VE users are more successful if the editor loads correctly, but less overall. This might mean that the editor fails to load often (which would not be unexpected because it is larger and takes longer to load, and the user might get impatient and cancel).

How do the results vary by platform (iOS vs Android)? If they differ a lot we could investigate further.

More questions that surfaced during today's CR <> Product meeting:

  • How does edit completion rate vary be experience level within our two test groups?
  • How how do we log abuse-filter save attempts?
  • What are edit completion rates on apps?
  • Are some pages more difficult to edit on mobile than others?
  • Are the people that are failing to successfully save, having success in other contexts?
JTannerWMF moved this task from Incoming to In progress on the VisualEditor (Current work) board.
JTannerWMF added a subscriber: JTannerWMF.

@ppelberg is working on this and will create children to this task

This comment pulls together the questions that emerged on this ticket, in various meetings and most recently, our discussion around what our next steps should be around the VE as default A/B test that took place in this task: T234277.

How does edit completion rate vary by ____ ?

  • Platform (iOS vs. Android)
  • Network connection
  • Editor switching
  • Wiki
  • Anonymous vs. registered
  • Country/region
  • Experience level
  • Abandon timing

RE "Network connection":

I can answer all of the questions listed above using existing instrumentation except for network connection; however, other available data such as geographical region, abort timing, and save failures should help provide insights into that question.

Alsee added a subscriber: Alsee.Nov 23 2019, 8:37 AM