After the Variant C vs. D test is deployed, we will want it to run for about 5 weeks. Similarly to how we did in T238888: Variant tests: "initiation" test (A vs. B), we then will analyze the results with the goal of choosing the "better" variant, and then giving all newcomers that variant. We will compare the two variants on these metrics:
- Visits (mobile only): what percent of homepage visitors view the full suggested edits module on mobile? This only applies on mobile because mobile users must tap the module preview or go through onboarding before getting to the full module.
- Interaction: what percent of homepage visitors interact with the suggested edits module? Note that we want to only count interactions with the fully-initiated suggested edits module, meaning things that happen post-initiation in both variants. The onboarding overlays for Variant C don't count and the topic and difficulty screens for Variant D don't count. We mean just: interacting with the topic or difficulty filter buttons, navigating cards with the arrows, hovering on the "i", or selecting a task. The same goes for mobile.
- Navigation: what percent of homepage visitors navigate to another task in the module?
- Task selection: what percent of homepage visitors click on a task to do?
- Edit success: what percent of newcomers save a suggested edit?
We want to split these four metrics by wiki and platform. We should use all 17 of our Wikipedias for this analysis. They have all had the Growth features since Oct 19, when the experiment started.
When this analysis is finished, we'll want to present it along with baseline numbers from Variant A. An open question: should we use the Variant A numbers from T238888: Variant tests: "initiation" test (A vs. B)? Or should we recalculate new Variant A numbers from September 2020, because (a) a lot of time has passed since March 2020, and (b) the combination of wikis will be very different?