Page MenuHomePhabricator

Implement A/B test bucketing for mobile search recommendation
Closed, ResolvedPublic5 Estimated Story Points

Description

Background

We need to setup an A/B test to allow relevant mobile users to see recommendations. Thankfully for this feature, given that it is initiated via user interaction, we should be able to set this up via javascript on the client.

context: https://wikimedia.slack.com/archives/G8QAPHCTT/p1731615045692519
https://www.mediawiki.org/wiki/Readers/Web/Instrumentation_Overview

User story

  • As an engineer on Web, I want to be able to show mobile recommendations to treatment users, and keep the status quo for control users

Requirements

  • Should be fully revert-able with no impact to current search
  • Should use generic solutions where it makes sense, and write our own code when needed
    • WikimediaEvents has webABTestEnrollment.js for firing an event to the event platform
    • core has mediawiki.experiments.js to setup buckets based off of the config
  • Should support two group (control/treatment) experiments rolled out to the same percentage
    • That percentage should be configurable per-wiki, and will change over the course of the experiment
  • Should ideally support URL query params to enable/disable for testing
Implementation details

RelatedArticles config

  • New config to store the ab test enabled/disabled state - This will allows us to enable/disable the AB test on a per wiki basis
  • New config to store bucketing percentages - This will allow us to adjust the bucketing, as a percentage of total users, on a per wiki basis.
  • This config should be in a format that can be passed to mw.experiments.getBucket() see here for format.

RelatedArticles JS

  • Load the given configuration on the client and pass it to mw.experiments.getBucket(). We use session ID as the bucketing key.
  • Trigger the experimental UI based on the given bucket.
    • If in test group trigger logging + new UI, if in control group trigger logging, if excluded do nothing.
  • Save the bucket to localStorage, ensuring that the user is given a consistent experience during the experiment duration.
  • Set an expiry on the localStorage key for the duration of the experiment.
  • On subsequent pageviews, we enable the experiment based on the saved bucket value, not the session ID.
  • Add an override to load the experiment via URL param. (If possible, disable logging in this state?)

BDD

Feature: A/B Test Bucketing for Mobile Search Recommendations

  Scenario: Displaying related articles in search based on A/B test bucket
    Given the user is on the mobile site in an incognito window
    When the user clicks on the search bar
    Then half the users should see related articles in the empty state
    And the other half should see nothing
    And an ABEnrollment event is logged in the network tab with the group field reflecting the assigned bucket

Test Steps

Test Case 1: Verify A/B Test Bucketing for Mobile Search Recommendations

  1. Open an incognito window and navigate to https://en.m.wikipedia.beta.wmflabs.org/w/index.php.
  2. Click on the search bar to initiate the search interaction.
  3. AC1: Confirm that for half the users, related articles appear in the empty state and that for the other half of users, the search bar remains empty with no recommendations.
  4. Open the Network tab in Google Chrome DevTools (right-click and select Inspect, then go to the Network tab).
  5. Filter the network requests to find events related to ABEnrollment.
  6. AC2: Verify that the group field in the event data reflects the assigned bucket (control or treatment).

Design

  • N/A

Acceptance criteria

Communication criteria - does this need an announcement or discussion?

  • N/A

Rollback plan

  • What is the rollback plan in production for this task if something goes wrong?

Critically, we should be able to revert fully to the control without impact to current search

QA Results - Beta

ACStatusDetails
1โœ…T378115#10399301
2โœ…T378115#10399301

This task was created by Version 1.2.0 of the Web team task template using phabulous

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptOct 24 2024, 4:58 PM

Questions to guide our implementation/understand scope:

  1. Will there ever be more than two groups (control/treatment)?
  2. Do we need the ability to do a percentage based rollout?
  3. Is there ever a situation where control and treatment are unevenly distributed? (note this makes analysis near impossible)
  4. Do we want to opt in individual users (us) to the experiment?
  5. What is the exact target audience of the experiment (logged out/which wikis/etc)? Will it change?

I believe these questions are for @ovasileva when she gets back

Hi all - jotting down some quick answers to hop

Questions to guide our implementation/understand scope:

  1. Will there ever be more than two groups (control/treatment)?

Not for this A/B test

  1. Do we need the ability to do a percentage based rollout?

Yes, if possible since the size of the audience is so large. (not a strict requirement if this would significantly increase the timeline). If not possible, we can rethink the overall wiki distribution and only run the test on a few select wikis.

  1. Is there ever a situation where control and treatment are unevenly distributed? (note this makes analysis near impossible)

No

  1. Do we want to opt in individual users (us) to the experiment?

If possible, something like a url parameter would be helpful for QA purposes.

  1. What is the exact target audience of the experiment (logged out/which wikis/etc)? Will it change?

Logged-out users across Wikipedias.

  • If possible, we might want to change the percentage of users exposed to the feature with large wikis (enwiki, eswiki, dewiki, jawiki, etc) showing the feature to a smaller percentage and remaining wikis 50/50.
  • If easier, we could also restrict the A/B test to a subset of wikis

Thank you! I've updated the task description with this information

bwang renamed this task from Setup an A/B test for relevant users for mobile recommendations to Implement A/B test bucketing for mobile search recommendation.Nov 21 2024, 10:03 PM

By the way, if you haven't seen yet โ€“ Metrics Platform data contract now has experiments fragment (cf. T368326: Update Metrics Platform Client Libraries to accept experiment membership) that you are encouraged to use.

bwang set the point value for this task to 5.

@mpopov thanks for mentioning that. We've split out the schema related work into T380926 and we're taken the experiments fragment into account.

bwang subscribed.

@SToyofuku-WMF and @Jdrewniak to pair on this, i cant assign multiple assignees some reason

Change #1100151 had a related patch set uploaded (by Jdrewniak; author: Jdrewniak):

[mediawiki/extensions/RelatedArticles@master] Introduce $wgRelatedArticlesABTestEnrollment

https://gerrit.wikimedia.org/r/1100151

Change #1092871 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/extensions/RelatedArticles@master] Wire RelatedArticles up to empty search hook and experiment configuration

https://gerrit.wikimedia.org/r/1092871

Change #1100151 merged by jenkins-bot:

[mediawiki/extensions/RelatedArticles@master] Introduce $wgRelatedArticlesABTestEnrollment

https://gerrit.wikimedia.org/r/1100151

Change #1100496 had a related patch set uploaded (by Jdrewniak; author: Jdrewniak):

[mediawiki/extensions/RelatedArticles@master] [WIP] Add AB test bucket localStorage & URL override

https://gerrit.wikimedia.org/r/1100496

Change #1100869 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[operations/mediawiki-config@master] Enable A/B test on beta cluster

https://gerrit.wikimedia.org/r/1100869

Change #1092871 merged by jenkins-bot:

[mediawiki/extensions/RelatedArticles@master] Wire RelatedArticles up to empty search hook and experiment configuration

https://gerrit.wikimedia.org/r/1092871

Change #1100869 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable Empty search A/B test on beta cluster

https://gerrit.wikimedia.org/r/1100869

Change #1101094 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[operations/mediawiki-config@master] Fixes A/B test for beta cluster

https://gerrit.wikimedia.org/r/1101094

Change #1101094 merged by jenkins-bot:

[operations/mediawiki-config@master] Fixes A/B test for beta cluster

https://gerrit.wikimedia.org/r/1101094

Jdlrobson subscribed.

This can now be QAed in beta.

In network tab you should see events to ABEnrollment and the group field should be different depending on the treatment!

Edtadros subscribed.

Test Result - Beta

Status: โ“Need More Info
Environment: Beta
OS: macOS
Browser: Chrome
Device: MS MBA
Emulated Device: NA

Test Artifact(s):

Test Steps

Test Case 1: Verify A/B Test Bucketing for Mobile Search Recommendations

  1. Open an incognito window and navigate to https://en.m.wikipedia.beta.wmflabs.org/w/index.php.
  2. Click on the search bar to initiate the search interaction.
  3. โ“ AC1: Confirm that for half the users, related articles appear in the empty state and the other half of users, the search bar remains empty with no recommendations.

@Jdlrobson, I could not really get an empty state. Is there something I should be doing for the bucketing to force that?

  1. Open the Network tab in Google Chrome DevTools (right-click and select Inspect, then go to the Network tab).
  2. Filter the network requests to find events related to ABEnrollment.
  3. โœ… AC2: Verify that the group field in the event data reflects the assigned bucket (control or treatment).

screenshot 75.png (1ร—961 px, 153 KB)

screenshot 73.png (1ร—964 px, 156 KB)

It should be possible to get bucketed in either group using a fresh incognito window. I just managed that for both groups now. Happy to sync if still not working.

Test Result - Beta

Status: โœ… PASS
Environment: Beta
OS: macOS
Browser: Chrome
Device: MS MBA
Emulated Device: NA

Test Artifact(s):

Test Steps

Test Case 1: Verify A/B Test Bucketing for Mobile Search Recommendations

  1. Open an incognito window and navigate to https://en.m.wikipedia.beta.wmflabs.org/w/index.php.
  2. Click on the search bar to initiate the search interaction.
  3. โœ… AC1: Confirm that for half the users, related articles appear in the empty state and the other half of users, the search bar remains empty with no recommendations.

In order to get close to a 50/50 split the sample size would need to be large. It is sufficient for this task to validate that some users are bucketed and others are not and the even has the correct information.

  1. Open the Network tab in Google Chrome DevTools (right-click and select Inspect, then go to the Network tab).
  2. Filter the network requests to find events related to ABEnrollment.
  3. โœ… AC2: Verify that the group field in the event data reflects the assigned bucket (control or treatment).

screenshot 75.png (1ร—961 px, 153 KB)

screenshot 73.png (1ร—964 px, 156 KB)

screenshot 54.mov.gif (1ร—1 px, 741 KB)
screenshot 77.png (1ร—1 px, 128 KB)

ovasileva added a subscriber: jwang.

Looks good to me, data QA will be done by @jwang in a separate ticket