Page MenuHomePhabricator

Mechanism for selecting contributors for A/B tests
Open, NormalPublic



We would like to be able to run multiple experiments across platforms simultaneously without cross-contamination. Imagine a scenario in which:

  • VisualEditor team wants to A/B test a new feature to see if it increases edits & editor retention
  • Mobile web team wants to A/B test a change to UI to see if it increases mobile edits
  • Growth team wants to A/B test a new UX to see if it helps new editor retention
  • Android and iOS both want to A/B test new features to see if they increase editor acquisition & retention

If each team randomly samples users independently of other teams, there is strong potential for users to end up enrolled in multiple A/B tests at the same time unless the teams queue up the A/B tests, which is not great. It would be better to have a system which allows for multiple A/B tests to be run simultaneously in a way that each user is guaranteed to be enrolled into at most 1 A/B test at a time.

Possible Solutions

Just an initial list of ideas:

1: Consistent, deterministic

One potential solution is to use the last (or first?) digit of the user ID to bucket users. (Since numeric user IDs are unique to each wiki and we'd want global sampling, we'd probably want to hash the username into a numeric ID and use the first/last digit of that.) This means we'd have 10 buckets of users and can run up to 9 randomized controlled, single-intervention experiments (with 1 bucket reserved for control group).

The teams can then coordinate on which buckets they're using. The control bucket would change over time and we can randomly pick one of the ten possible ones before starting an A/B test, so the same group of users isn't the control group every time.

We could even do 3 A/B/C tests, potentially. Alternatively, each A/B test can have its own control & treatment buckets, which means we'd be allowed up to 5 A/B tests at a time.

2: Enrollment database

Basic premise: have a DB of which users are enrolled in which A/B tests and which group they're in, then have a queryable API endpoint that can be used to check if a user is currently enrolled.