Page MenuHomePhabricator

New Impact module's empty state on mobile: research spike
Closed, ResolvedPublic

Assigned To
Authored By
KStoller-WMF
Feb 27 2023, 3:38 AM
Referenced Files
F37030246: image.png
May 23 2023, 2:48 PM
F36885709: image.png
Mar 1 2023, 9:56 AM
F36885707: image.png
Mar 1 2023, 9:56 AM
F36885703: image.png
Mar 1 2023, 9:56 AM
F36885698: image.png
Mar 1 2023, 9:56 AM
F36884937: Screenshot 2023-02-28 at 11.05.06 AM.png
Feb 28 2023, 7:06 PM
F36884936: Screenshot 2023-02-28 at 11.04.50 AM.png
Feb 28 2023, 7:06 PM
F36875393: Screenshot 2023-02-27 at 8.09.43 PM.png
Feb 28 2023, 4:56 AM

Description

User story:

As a Product Manager, I want to explore the difference in activation rates between the new and old Impact module's empty state on mobile, because I want to be sure all Growth features improve activation and retention.

Background & research:

Impact module leading indicators seem healthy and we've received positive feedback about the new design from newcomers and experienced editors. However when we look at Activation data we see that is appears there is a slightly lower activation rate for the new impact module on mobile:

Screen Shot 2023-02-26 at 6.38.11 PM.png (806×1 px, 390 KB)

Before a newcomer has activated (edited for a first time) they will only see the impact module's empty state. So this change in activation only relates to the empty state, not the actual new impact module. The old empty state design is very similar to the new empty state design, so we would expect to see no change in the Activation rate.

Old Impact moduleNew Impact Module
Screen Shot 2023-02-26 at 6.51.37 PM.png (374×1 px, 201 KB)
Screen Shot 2023-02-26 at 6.52.11 PM.png (472×1 px, 196 KB)
image.png (1×720 px, 104 KB)
Screen Shot 2023-02-26 at 6.57.56 PM.png (1×1 px, 426 KB)
Questions:
  • We've released some improvements to the Impact module empty state logic and loading (example: T324285 & T322832), is it possible that some of those edge cases were enough to cause this discrepancy? Are there other edge cases we aren't considering?
  • Is there any chance we are showing the "new design" alert on mobile?
  • Is there a difference in performance / loading speed?
  • Is there a difference in error rates?
  • Could older mobile devices be having trouble loading Vue or any frontend code?
  • Could minor design or copy differences explain the activation discrepancy?
    • The new module shows the blank scorecards: 0 thanks and a blank streak.
    • The new design has lower contrast (background isn't white) and the 0 edits so far text is smaller.
    • The new design doesn't have the suggested edits explanation close to the "See suggested edits" button.
  • Could this be statistical anomaly? / What is the level of confidence?
Acceptance Criteria:
  • Investigate loadtimes and error rates for the new impact module compared to the old impact module.
  • Investigate Lighthouse performance & best practices scores for the new impact module to see if there are any easy improvements to consider.
  • Write subtasks for any changes we should consider.

Related Objects

Event Timeline

We've released some improvements to the Impact module empty state logic and loading (example: T324285 & T322832), is it possible that some of those edge cases were enough to cause this discrepancy? Are there other edge cases we aren't considering?

I doubt that those two tasks are related.

Is there any chance we are showing the "new design" alert on mobile?

Yes, we are, including for brand new users, which seems wrong per T323619: NewImpact: Introduce new design to existing newcomers via GuidedTour and Drawer. You can check the behavior with this snippet in your browser console: new mw.Api().saveOptions( { 'growthexperiments-tour-newimpact-discovery': 0 } ); On desktop, no "updated design" prompt appears, but on mobile, we show the "updated design" dialog after the user taps on the module.

Is there a difference in performance / loading speed?

We no longer render the summary via the server side. So yes, the rendering is slower but there is also a skeleton to compensate. That said, I believe the suggested edits module is what would drive activation, not the impact module, so I am not sure that performance of the impact module rendering is relevant here.

In Grafana, you can see the mount time for the overlay summary: it's steady at ~200ms.

image.png (612×810 px, 68 KB)

From meeting just now: I'd suggest comparing specifically the funnel of users who tap on empty impact module, then press "See suggested edits". If there is a marked decrease in users who click on "See suggested edits" in the new impact module, that could help explain the lower activation rate.

On desktop, no "updated design" prompt appears, but on mobile, we show the "updated design" dialog after the user taps on the module.

OK, I added: T330692: Impact Module: don't show "Updated design" info drawer to new users

From meeting just now: I'd suggest comparing specifically the funnel of users who tap on empty impact module, then press "See suggested edits". If there is a marked decrease in users who click on "See suggested edits" in the new impact module, that could help explain the lower activation rate.

Agreed, I added this and a few other questions to T327581: New Impact Module: experiment analysis

I think it's possible that this difference in activation relates to a few differences, and can't be explained by just one easy to identify problem, so I've also added a design related task: T330695: New Impact module's empty state on mobile: Design iteration

Wrt performance, we load the user impact for the mobile preview so that could slow down display of the suggested edits module as well. OTOH we are talking about users with 0 edits here, the impact lookup should not take a perceptible amount of time.

The new module uses Vue and it's needed to show the mobile overview, right? So maybe loading Vue makes a performance difference?

Wrt errors, T324930: NewImpact: Cannot read properties of undefined (reading 'days') is somewhat common (40-50/day) but the user impact should be barely noticeable. There's about 10/day of HTTP error with a similarly unhelpful stack trace, not sure about that. (Maybe related to userimpact API calls? Can't see how it would happen.) There's a similar amount of mustbeloggedin-generic errors (session timeout?) which I can't imagine to be possibly related. Other JS errors are either clearly not impact module related, or very low frequency.

I agree funnel or user pathway style representation of the data would be useful. E.g. it would be interesting to see if the first click users make on the overview page is affected (maybe the impact module looks more appealing now, so more users click it instead of suggested edits, but then it doesn't contain anything that can be interacted with so they bounce?).

Wrt performance, we load the user impact for the mobile preview so that could slow down display of the suggested edits module as well. OTOH we are talking about users with 0 edits here, the impact lookup should not take a perceptible amount of time.

Good to know, I didn't realize the data loaded would be different for brand new accounts (with zero edits). Far more newcomers on mobile end up on the newcomer homepage than navigate to the actual empty impact module view, so perhaps it makes sense to look at that closely.

When I analyze the page load with Lighthouse (for an account with zero edits) it seems like the homepage with the new impact module regularly has a higher "Time to Interactive" and "Largest Contentful Paint".

Newcomer homepage page load comparison:

with "Old" impact modulewith new-impact module enabled
Screenshot 2023-02-28 at 11.04.50 AM.png (1×1 px, 226 KB)
Screenshot 2023-02-28 at 11.05.06 AM.png (1×1 px, 271 KB)

These values change slightly each time I run the report, but it does seem like the "Time to Interactive" and "Largest Contentful Paint" metrics are different enough that we might want to look into this further?

Huh, those are pretty high values. Is that on a production website? We switched one of the panels to Vue which cannot be rendered on server side so it's not surprising some of the paint happens later, but I would have expected way smaller values both with and without Vue.

Huh, those are pretty high values. Is that on a production website?

Yes, those values are from Chrome Lighthouse DevTools on French Wikipedia using the same account, and using new-impact=1 to opt into seeing the new impact module. The screenshots I added are for "Mobile" stats, not the "Desktop" stats. My understanding is that for mobile stats, Lighthouse tries to simulate imperfect mobile conditions: a slower connection on a slightly underpowered device. So it's not too surprising to me that the values are higher, but it is surprising that the numbers are so different between new-impact and the "old impact" module when the empty states are nearly the same. But it sounds like there might not be an easy performance solution here if the issue relates to Vue and lack of server side rendering?

@KStoller-WMF can you please share the full URL you used with Lighthouse?

old impactnew impact
image.png (2×1 px, 455 KB)
image.png (2×1 px, 433 KB)

This is what I see. (@KStoller-WMF the files you posted in T330614#8651465 are restricted, can you make them public please?)

The definition of Time to Interactive for Lighthouse is:

TTI measures how long it takes a page to become fully interactive. A page is considered fully interactive when:

The page displays useful content, which is measured by the First Contentful Paint,
Event handlers are registered for most visible page elements, and
The page responds to user interactions within 50 milliseconds.

Whereas for us, I think the more useful metric is how long it takes the suggested edits module to be interactive, since that is the main call to action for the user:

image.png (1×900 px, 194 KB)

Once the user has tapped on suggested edits module, we see that the TTI is 381 ms according to our instrumentation:

image.png (546×1 px, 69 KB)

But we don't currently instrument on the client-side the loading of all modules to show in Special:Homepage on mobile. We should probably do that: time to interactive for each mobile summary block, and time for all modules to render.

But we don't currently instrument on the client-side the loading of all modules to show in Special:Homepage on mobile. We should probably do that: time to interactive for each mobile summary block, and time for all modules to render.

We do instrument the mobile summary rendering time for the new impact module: https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?orgId=1&from=now-7d&to=now&viewPanel=170&var-platform=All&var-UserImpactHandlerCache=All&var-UserImpactHandlerPingLimiter=NoData&var-impactrendermode=All

That shows that in most cases, the impact module's summary state (whether it has data or not) is within 50ms-350ms.

I've dug into this phenomenon and found as follows:

  1. Using a dataset from a time period starting when the experiment ended, there is not the same drop in activation on mobile.
  2. Estimating the statistical power based on the observed effect on mobile and the number of registrations has us coming in lower (83%) than what we prefer (90%). Although statistical power concerns itself with false negatives, whereas this appears to be a false positive, we also want to make sure we gather enough data to draw strong conclusions.
  3. Event data looking at whether users click on the "See suggested edits" button when shown the empty Impact module finds a lower likelihood of clicking for those who get the New Impact module, but the difference is not consistent in direction nor is it large enough to explain the difference in activation.

We discussed this in the team and decided to investigate T330692. We don't see any other changes needed to the New Impact module at this time. I also recommended that we run our experiments for longer (6–8 weeks) in order to increase our statistical power, and that we add article activation to the list of leading indicators for T328055.