Page MenuHomePhabricator

Check impacts of mobile VE load screen improvements
Closed, ResolvedPublic


There are two main questions we want to answer using quantitive metrics. Information necessary to answer both question is logged in the EditAttemptStep data stream, although we should first start oversampling all mobile visual editor events so we have plenty of data available.

We could look at these question in an A/B test, but for a feature of this size, it doesn't seem worth it. Instead, we will simply roll out the feature and compare the data from before and after.

Do the load screen improvements...

  1. ...change how many users stick with their edit attempt long enough for the interface to fully load?
    • This is our main metrics: we hope to increase the proportion of users make it through the loading process (technically, the ready rate) although the current rate is already 95%, so there's not that much room for improvement. Even if we don't see an improvement, that doesn't invalidate the case for the project since another reason for doing this is making users feel more confident using the editor, which we can't easily test quantitively.
  1. ...change the overall load times?
    • This is a guardrail metric: we want to make sure that adding this complexity to the loading process doesn't end up increasing the overall load time (technically, the ready time).

The deploy is tentatively planned for the week of 11 March.

Event Timeline

I'm not sure if this is the place to add this requirement but we should design the ux for opening VE to make an edit and then use the following scenarios to test that this solution is extensible:

  • opening a Talk page
  • switching between watchlist tabs

This will give us the opportunity to test the transition pattern library (that we create) and to ensure that we don't over-engineer for transitions specific to modal views (which just so happens to be the kind of transition that we're starting with here).

phuedx added a subscriber: phuedx.Dec 18 2018, 11:27 AM

How long does it take on average for pages to load?

IIRC this was something like "How long does it take to load the talk overlay (or the categories overlay)?"

nshahquinn-wmf renamed this task from Overlay Specifications to Analyze impact of mobile visual editor load screen improvements.Dec 18 2018, 6:20 PM
nshahquinn-wmf claimed this task.
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf added a subscriber: kzimmerman.EditedDec 18 2018, 8:53 PM

Note that although I've written this plan, I haven't yet accepted or prioritized this work.

I still need to understand a bit more about the timeline. Once we have that from @JTannerWMF, @kzimmerman and I will be able to assess this and make a tentative decision.

JTannerWMF moved this task from To Triage to Up next on the VisualEditor board.Dec 19 2018, 6:34 PM
nshahquinn-wmf moved this task from Up next to Analysis on the VisualEditor board.Feb 11 2019, 6:17 PM

@Esanders asked why the 20,000 sessions per month that we're currently logging from the mobile visual editor aren't enough for this analysis. Unfortunately, I don't have an rigorous reason, just my intuition that (1) the greater power to detect small changes could be useful here and (2) we're going to be doing a lot of analysis on the mobile visual editor from here on out, so we shouldn't constrain ourselves to a relatively small dataset when a bigger one is easily available.

In chat today (11-March), we talked about – and decided – to merge the Loading Overlay patch, for release later this week.

Absent from the above*, was a discussion of whether our instrumentation is set up to properly:

  1. Measure the impact of the Loading Overlay on the proportion of contributors who make it "through" the loading process | see T212137
  2. Detect possible regressions (as measured by changes to overall load times) | see T217826

So with this in mind, @DLynch + @Esanders + @Neil_P._Quinn_WMF, as of now, is our instrumentation set up to properly measure "1." and "2."?

@Neil_P._Quinn_WMF's comment here leads me to think we are, but I wanted to be sure...

In T214450#5016095, @Neil_P._Quinn_WMF wrote:

Here's the 10th/50th percentiles for 2019 (90th looks similar). The data is very noisy with no obvious improvements. :/ Daily sample sizes are <1000

Nice work! I think the mobile VE oversampling should help deal with the noisiness.

*Fwiw: I see it as my responsibility to ask these questions before a decision is made about whether a patch is ready to be merged or not. Posting here for posterity.

Over-sampling status
Over-sampling of EditAttemptStep data stream seems to be working per @Esanders's investigation of the below.

See the jump in sample size (right-most column) between 2019-03-07 and 2019-03-08:

phuedx removed a subscriber: phuedx.Mar 13 2019, 9:39 AM
ppelberg renamed this task from Analyze impact of mobile visual editor load screen improvements to Check impacts of mobile VE load screen improvements.May 17 2019, 5:15 PM
nshahquinn-wmf triaged this task as Medium priority.Jul 12 2019, 11:32 AM
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf moved this task from Next Up to Doing on the Product-Analytics board.
nshahquinn-wmf closed this task as Resolved.Jul 12 2019, 12:34 PM

I took a look at this since I was already calculating both ready rates and ready timings as part of T221197.

Summary: The loading screen improvements for the mobile visual editor:

  • had no impact on the proportion of users sticking with their edit attempt through the loading process (the ready rate)
  • worsened the time taken for the interface to become ready for user input (the ready time), although not enough to have an impact on the ready rate

Note that both metrics were affected by a serious issue where the mobile visual editor sent ready events too early (T217825), but were probably still directionally accurate.

Ready time

Comparing the 8 days before the train deployment (Mar 4-11) with the 8 days after (Mar 15-22), we had the following increase in ready times:

change after deploy
10th percentile37.7%
90th percentile11.3%

Ready rate

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptJul 12 2019, 12:34 PM