Page MenuHomePhabricator

[Spike] Investigate A/B test results of Growth Experiments impact module
Closed, ResolvedPublic3 Estimated Story PointsSpike

Description

What teams or group is this for?

Growth Team

Who is your main point of contact and contact preference?

Product: @KStoller-WMF
Eng: @Tgr / @kostajh

What are the details of your request? Include relevant timelines or deadlines

The Growth team has been running an A/B test in which they are testing a new and improved Impact module on the newcomer homepage.

They are seeing an odd result though: fewer newcomers are “activating” (AKA editing for the first time). This result is unexpected because the empty state of the new and old impact modules are nearly identical. There shouldn’t be any change in activation because before a user has edited for the first time, they basically see the same design. The only fundamental difference is that the the new empty state is built using Vue. Growth engineers & Growth QA have been testing and trying to determine what’s happening, but have been pretty stumped.

The growth team is requesting borrowing some of DST's time to help investigate and brainstorm potential causes and solutions.

How does the request fit within departmental or foundation priorities?

Is this request urgent or time-sensitive?

Not super urgent, but would block further scaling.

Related:
T330614: New Impact module's empty state on mobile: research spike
T334411: Positive Reinforcement: investigate difference in mobile activation

Event Timeline

CCiufo-WMF created this task.
kostajh edited subscribers, added: Tgr; removed: Gergo29.

Change 924487 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] NewImpact: Return empty user impact for no edits

https://gerrit.wikimedia.org/r/924487

Change 924491 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] NewImpact: Cache empty user impact on account creation

https://gerrit.wikimedia.org/r/924491

Change 924491 abandoned by Kosta Harlan:

[mediawiki/extensions/GrowthExperiments@master] NewImpact: Cache empty user impact on account creation

Reason:

Duplicate of Ib201e280d21b411affa623c5f5c5a7d361b75dd8

https://gerrit.wikimedia.org/r/924491

Hey @kostajh and @Tgr – I'm happy to help investigate this. My initial thoughts are that we may be looking at:

  • a performance related issue (we're shipping additional code for the Vue/Codex version of the feature and it's causing folks to bounce)
  • a compatibility issue causing things to error out on a significant number of clients (which ones?)
  • some combination of the above

If it makes sense I'd be happy to find a time to pair up with one of you this week to talk about the problem and look at any data you have already, and then I can make sure I'm looking at the right dashboard in Logstash or wherever.

Hey @kostajh and @Tgr – I'm happy to help investigate this. My initial thoughts are that we may be looking at:

  • a performance related issue (we're shipping additional code for the Vue/Codex version of the feature and it's causing folks to bounce)

There's a bit of data about amount of code we're shipping to the client in https://phabricator.wikimedia.org/T334411#8833154

  • a compatibility issue causing things to error out on a significant number of clients (which ones?)

We would see some client error logging in this case, right?

  • some combination of the above

If it makes sense I'd be happy to find a time to pair up with one of you this week to talk about the problem and look at any data you have already, and then I can make sure I'm looking at the right dashboard in Logstash or wherever.

There are some charts in https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?orgId=1&from=now-7d&to=now (look for "Impact module") and see also the window.performance charts, which unfortunately are not split out by new-impact (Vue) vs old impact modules.

Change 924487 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] NewImpact: Cache empty user impact on account creation

https://gerrit.wikimedia.org/r/924487

Change 924571 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.11] NewImpact: Cache empty user impact on account creation

https://gerrit.wikimedia.org/r/924571

Change 924571 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.11] NewImpact: Cache empty user impact on account creation

https://gerrit.wikimedia.org/r/924571

Mentioned in SAL (#wikimedia-operations) [2023-05-31T08:24:01Z] <kharlan@deploy1002> Started scap: Backport for [[gerrit:924571|NewImpact: Cache empty user impact on account creation (T337320)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-31T08:25:39Z] <kharlan@deploy1002> kharlan: Backport for [[gerrit:924571|NewImpact: Cache empty user impact on account creation (T337320)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-31T08:37:49Z] <kharlan@deploy1002> Finished scap: Backport for [[gerrit:924571|NewImpact: Cache empty user impact on account creation (T337320)]] (duration: 13m 48s)

Change 924941 had a related patch set uploaded (by Urbanecm; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.10] NewImpact: Cache empty user impact on account creation

https://gerrit.wikimedia.org/r/924941

Change 924941 merged by Urbanecm:

[mediawiki/extensions/GrowthExperiments@wmf/1.41.0-wmf.10] NewImpact: Cache empty user impact on account creation

https://gerrit.wikimedia.org/r/924941

Mentioned in SAL (#wikimedia-operations) [2023-05-31T14:07:12Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:924941|NewImpact: Cache empty user impact on account creation (T337320)]], [[gerrit:924940|Personalized praise: Fix first-ever notifications (T322452)]]

Mentioned in SAL (#wikimedia-operations) [2023-05-31T14:08:49Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:924941|NewImpact: Cache empty user impact on account creation (T337320)]], [[gerrit:924940|Personalized praise: Fix first-ever notifications (T322452)]] synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2023-05-31T14:14:38Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:924941|NewImpact: Cache empty user impact on account creation (T337320)]], [[gerrit:924940|Personalized praise: Fix first-ever notifications (T322452)]] (duration: 07m 26s)

@kostajh I'm going to pull in your comment from T334411 here because this seems like very relevant info.

I created a user account "KHarlan Test New Impact‬", with no contributions, and used ge.utils.setUserVariant('oldimpact') to switch to the non-Vue interface, and ge.utils.setUserVariant('control') to switch to the Vue interface. I used Firefox 114, and used the Network inspector to disable HTTP caching. Then I reloaded the page and recorded results.

Note, in the results, the terms load and DOMContentLoaded have the following meanings:

DOMContentLoadedThe DOMContentLoaded event fires when the initial HTML document has been completely loaded and parsed, without waiting for stylesheets, images, and subframes to finish loading.docs
loadThe load event is fired when the whole page has loaded, including all dependent resources such as stylesheets, scripts, iframes, and images.docs

load is the more important event from the end-user perspective.

ScenarioHTTP request countData transfer (kB) DOMContentLoadedload
Desktop, Vue435032.21s3.53s
Desktop, old impact56398966ms1.31s
Mobile, Vue27415496ms628ms
Mobile, old impact276621.97s2.05s

Some observations

  • Old impact module on desktop makes more HTTP requests
    • Follow-up: identify what these additional requests are about
  • Desktop, old impact: Loads faster and transfers less data than Vue, as expected.
    • Follow-up: It is surprising that the Vue implementation is more than double the load time for the non-Vue implementation. Figure out where/what/why
  • Mobile: Vue implementation transfers less data and has significantly faster load time than non-Vue implementation
    • Follow-up: Understand why this is the case

The discrepancy in load time seems like the first thing worth investigating in more detail. Three and a half seconds is a long time indeed (though it's interesting that the mobile UI, which also uses Vue loads so much more quickly).

If I wanted to do some digging here, what is the best way to see the various UIs in my browser to try and profile things? Is there a production or beta Wiki that I can go to in order to activate this feature? Since this UI is targeted to new editors, will I have to do certain things if I want to keep seeing the page in the same state multiple times (clearing localstorage, setting user prefs, etc)?

I think that if I had a script to follow, similar to what you'd provide for QA testing, I could try to help diagnose.

egardner changed the point value for this task from 1 to 3.Jun 1 2023, 4:25 PM
egardner changed the task status from Open to In Progress.Jun 1 2023, 4:33 PM

Is there a production or beta Wiki that I can go to in order to activate this feature?

@egardner
The new impact module is available on the newcomer homepage (see the "Your impact" section).
If you aren't using a new account will need to be enabled via Preferences: "Display newcomer homepage".

I believe the new impact module is the default on Beta and Test Wikis. You can disable the new impact module using &new-impact=0.

But you can also opt into it by adding &new-impact=1 to the end of the homepage URL on any Wikipedia. For example:
https://en.wikipedia.org/w/index.php?title=Special:Homepage&new-impact=1

If I wanted to do some digging here, what is the best way to see the various UIs in my browser to try and profile things? Is there a production or beta Wiki that I can go to in order to activate this feature? Since this UI is targeted to new editors, will I have to do certain things if I want to keep seeing the page in the same state multiple times (clearing localstorage, setting user prefs, etc)?

It's live on a handful of wikis for 50% of new users; you can enable/disable there by running ge.utils.setUserVariant('control') / ge.utils.setUserVariant('oldimpact') in the browser's JS console. It's on everywhere on beta but the performance characteristics of beta tend to be very different. It's on for all users on testwiki so that's probably easiest.

You should probably check with a new user with zero edits - that results in a different UI, and that UI must be somehow the cause of problems (we are seeing an activation drop, ie. users being less likely to make their first edit).

My initial thoughts are that we may be looking at:

  • a performance related issue (we're shipping additional code for the Vue/Codex version of the feature and it's causing folks to bounce)
  • a compatibility issue causing things to error out on a significant number of clients (which ones?)

I think it could be

  1. something that causes a JS error in such a way that it breaks normal error logging (so we don't see it in logstash)
  2. more browsers being served the inferior no-JS version since Vue requires the es6 ResourceLoader flag which sets higher compatibility requirements (although this is infeasible because the effect size is too large)
  3. performance issue caused by larger payloads
  4. performance issue caused by Vue requiring more CPU etc
  5. some mistake in the GrowthExperiments application logic
  6. design differences between the Vue version and the old version (although they are tiny and the Vue version has the more call-to-action-y look)

If I wanted to do some digging here, what is the best way to see the various UIs in my browser to try and profile things? Is there a production or beta Wiki that I can go to in order to activate this feature? Since this UI is targeted to new editors, will I have to do certain things if I want to keep seeing the page in the same state multiple times (clearing localstorage, setting user prefs, etc)?

It's live on a handful of wikis for 50% of new users; you can enable/disable there by running ge.utils.setUserVariant('control') / ge.utils.setUserVariant('oldimpact') in the browser's JS console. It's on everywhere on beta but the performance characteristics of beta tend to be very different. It's on for all users on testwiki so that's probably easiest.

You should probably check with a new user with zero edits - that results in a different UI, and that UI must be somehow the cause of problems (we are seeing an activation drop, ie. users being less likely to make their first edit).

My initial thoughts are that we may be looking at:

  • a performance related issue (we're shipping additional code for the Vue/Codex version of the feature and it's causing folks to bounce)
  • a compatibility issue causing things to error out on a significant number of clients (which ones?)

I think it could be

  1. something that causes a JS error in such a way that it breaks normal error logging (so we don't see it in logstash)
  2. more browsers being served the inferior no-JS version since Vue requires the es6 ResourceLoader flag which sets higher compatibility requirements (although this is infeasible because the effect size is too large)
  3. performance issue caused by larger payloads
  4. performance issue caused by Vue requiring more CPU etc
  5. some mistake in the GrowthExperiments application logic
  6. design differences between the Vue version and the old version (although they are tiny and the Vue version has the more call-to-action-y look)

@egardner to add to the above: the main way that users "activate" isn't directly related to the impact module, it's through interacting with the suggested edits module and then going on to edit. I still don't have a mental model of how anything related to the impact module loading slowly (or even not loading at all, though we have no evidence for that) would affect the user's interaction with the most prominent module on the page, suggested edits.

Desktop
image.png (1×3 px, 502 KB)
Mobile
image.png (1×870 px, 208 KB)

Thanks @KStoller-WMF, @kostajh, and @Tgr for the helpful context here. I went ahead and created a new dummy account on testwiki and started poking around on Special:Homepage, using the utility methods Kosta mentioned to switch between the old and new UI.

I'm still pretty new when it comes to the interpretation of flame graphs and the like, but I'll report back with anything interesting I discover here.

Thanks @egardner!

Just FYI we scaled the new Impact module to more wikis, so we have slightly more data here:
https://phabricator.wikimedia.org/T334411#8904606
And here:
Superset dashboard

Please let us know if you want any additional information, or even if you have ideas for what you suggest we test next. Thanks!

I'm wondering if this might be something worth looking into, because in my mind it appears to be a bug:

On my WMF account I have a single edit on en wiki, which is to my common.js file.

I noticed that when I view the old dashboard, I get the empty state "Your Impact" module, meaning my 1 edit to my common.js file is not counted:

image.png (851×1 px, 196 KB)

But when I enable the new dashboard (setting &new-impact=1), it actually includes my 1 edit:

image.png (770×1 px, 130 KB)

I'm not sure what the intended behavior here is, but maybe this is the cause for discrepancy in the tests? If we aren't actually consistently showing the empty state between the two versions, then maybe users on the new UI are more likely to be drawn to the impact module area because it's not actually empty. My default reaction was to see what my one edit was, which takes me to a new page.

Yeah, at some point we stopped filtering by namespace. I'd assume that there are so few new users with a non-mainspace first edit that they can't cause the difference in activation... @nettrom_WMF is that something you can confirm or refute?

Yeah, at some point we stopped filtering by namespace. I'd assume that there are so few new users with a non-mainspace first edit that they can't cause the difference in activation... @nettrom_WMF is that something you can confirm or refute?

I found this to be a possibly interesting difference in the UX that we hadn't caught earlier, and something I knew I could answer with existing code I used for the research spike we did earlier. From what I can tell, there are two questions we're looking to answer here:

  1. What proportion of newcomers activate by editing in non-article namespaces?
  2. Of the newcomers whose first edit is a non-article edit, what proportion go on to also make an article edit? We're particularly curious if this proportion differs between the experiment groups.

I used data from the start of the Levelling Up experiment (late March) until today from our four pilot wikis, as that's been the focus of earlier spikes based on our concerns. Since our interface doesn't care about reverts, I modified this to count all edits (we've previously focused on non-reverted edits, but have seen the same kind of pattern regardless).

Regarding the first question, the proportion is quite large and appears to differ significantly between desktop and mobile registrations: about 10% of new registrations activate outside of the main namespace. These account for 23–25% of activations on desktop, and 29–30% of activations on mobile. Most of those are to User & User talk, followed by Wikipedia & Wikipedia talk. There's not a significant difference between the experiment groups.

When it comes to the second question, this proportion again is different on desktop and mobile, but is not significantly different between experiment groups. Out of those who start out with a non-article edit, 14.3–15.8% on desktop and 9.8–10.3% on mobile go on to later make an article edit. As mentioned, there's not a significant different between the experiment groups, and the higher proportion swaps for the groups between platforms (higher on the new interface on desktop, opposite on mobile).

Conclusion: I don't see evidence that those who start out with a non-article edit are enticed by the UX to go on and make a subsequent article edit.

Thanks, @nettrom_WMF!
It sounds unlikely that this difference is causing the activation issue.
However, given that 23%-30% of new registrations activate outside of the main namespace, we might still want to make an improvement.
@JFernandez-WMF or @RHo: any thoughts on if we should return to the prior logic (of showing the empty state to all users who have not completed a mainspace edit) or consider layout improvements: T338640: Impact Module: New editors who have edited outside of the main namespace should see the Impact module empty state

I spent some more time looking at this today. Unfortunately I don't see anything jumping out in the performance data that might be leading to the discrepancy in activation – especially since we're comparing two different versions of one module (Impact module) while the activations happen (or fail to happen) in the Suggested Edits module which is unchanged.

Special:Homepage with Old Impact module:

Screenshot 2023-06-16 at 3.44.43 PM.png (2×3 px, 1 MB)

Special:Homepage with New Impact module:

Screenshot 2023-06-16 at 3.44.35 PM.png (2×3 px, 1 MB)

I did some testing in Chrome, on a relatively new MacBook Pro, accessing the site from a high-bandwidth connection on the West Coast of the US.

In my testing performance indicators seemed pretty similar. Below is a comparison of a request for each version of the Homepage (I used URL params to switch). Results are with browser caching disabled but no network throttling. This is more anecdote than data but nothing stands out as a big problem here.

Old ImpactNew Impact
DOMContentLoaded452.5ms598.42ms
First Paint460.2ms591.2ms
First Contentful Paint460.2ms591.2ms
Onload584.42ms653.26
Largest Contentful Paint1.55s1.5s
Size of GrowthExperiments module & dependencies122KB230KB

The ~90KB difference in module size is significant and may be part of the reason for the discrepancy in activation rates. That extra payload includes several things that I hope we can eventually remove (though hard to say when):

  • Remove the Vue compatibility build – once we are confident that no production Vue code needs the Vue 2 behavior any longer, we can remove the "compatibility build" and ship Vue 3 without any compatibility shims. This will save some KB but I'm not sure exactly how much. See T289017 to track progress of this work.
  • Use the production build of Vue itself – the Vue runtime will always be needed, of course, but we are also shipping the template compiler in the MW Vue build. If we ever get a build step in place, we can pre-compile templates and start shipping the "production" build of Vue that omits the compiler. This will result in roughly a 10-30% size reduction (so maybe we'd shave ~10KB off the final bundle size). In order to accomplish this, we may need to make significant changes in MW infrastructure so it's hard to say if or when this might happen (see T328699 for more details).
  • Enable some kind of "tree-shaking" in Codex – In addition to Vue, the new Impact module must also load all of Codex, even though only a subset of the Codex components are needed here. This is something we're hoping to spend more time investigating and prototyping soon (see T335317).

Beyond this, I don't really have a lot of concrete suggestions I'm afraid. It's possible that some of the design changes which are part of the new impact module (skeleton loading state, etc) may also have something to do with the change in activation – perhaps this is actually drawing users' attention away from the suggested edits module? It's hard to say based on the data I've seen so far.

As far as my list above, the first item (removal of compatibility build) will definitely happen (hopefully at some point this summer). The third item (Codex tree-shaking) will also hopefully happen eventually, but it's not clear what form this will take. The second item (front-end build step to support Vue template pre-compilation) may or may not ever happen (and if this were to happen it would probably be a ways off).

Thanks, @nettrom_WMF!
It sounds unlikely that this difference is causing the activation issue.
However, given that 23%-30% of new registrations activate outside of the main namespace, we might still want to make an improvement.
@JFernandez-WMF or @RHo: any thoughts on if we should return to the prior logic (of showing the empty state to all users who have not completed a mainspace edit) or consider layout improvements: T338640: Impact Module: New editors who have edited outside of the main namespace should see the Impact module empty state

Hi @KStoller-WMF - will cross-post on that task but I wonder if the large number of outside mainspace activations are in User Talk and Wikipedia/Wikipedia Talk tracks with newcomers using the Help panel/Mentor feature?

CCiufo-WMF claimed this task.

I'm going to close this spike given the scoped effort we wanted to put into an initial investigation on the DST side. As @egardner has mentioned in his summary, we will be working on some improvements to the library bundle size that we hope will improve overall performance. If there are additional spikes or tests we want to do, we should open a new ticket.