Page MenuHomePhabricator

Document desired properties of an enrollment sampling algorithm
Closed, ResolvedPublic

Description

In a meeting, @phuedx and @nettrom_WMF met to discuss approaches to sampling users for experiment enrollment. We reviewed the GrowthExperiments approach as well as PageSplitterInstrumentation.php and mediawiki.experiments.js

From our conversation, we drafted the following preferred properties:

  • Does not require a backend store. Example: GrowthExperiments stores group assignment in the user_properties table.
  • Can sample on a variety of levels such as page, session, user.
  • Will sample consistently if given the same starting value (e.g. if we're sampling on page ID, the same page ID will always return the same assignment).
  • Scales to sample when needed. For example, we can sample when a user visits a specific page and thereby not assign groups to users who never visited that page.

There might be aspects related to this that we did not capture, or where the descriptions can be improved. We're therefore looking for @mpopov to review and provide input.

Event Timeline

mpopov triaged this task as Medium priority.Aug 8 2024, 9:06 PM
mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
mpopov moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Will sample consistently if given the same starting value (e.g. if we're sampling on page ID, the same page ID will always return the same assignment).

I want to note that we want to ensure consistency of assignment within experiment. For example, if I remember right in the past some teams have run experiments where they assign users to control/treatment based on whether the user ID in the MW database is odd or even.

If we run 2 experiments sampled on page ID and both having 2 groups (control and treatment), the same page ID should return the same assignment per experiment, so when the determination happens these should be equally likely (assuming equal sampling rates):

experiment 1experiment 2
controlcontrol
controltreatment
treatmentcontrol
treatmenttreatment

NOT

experiment 1experiment 2
controlcontrol
treatmenttreatment

Can sample on a variety of levels such as page, session, user.

+1

Scales to sample when needed. For example, we can sample when a user visits a specific page and thereby not assign groups to users who never visited that page.

+1

Does not require a backend store. Example: GrowthExperiments stores group assignment in the user_properties table.

Essentially the algorithm must operate on some identifier to determine group assignment rather than retrieving the assignment from an external source. Makes sense, but if we're making the determination on the starting value every time we need then we can't ever modify experiment settings for the duration of the experiment.

Suppose an editor is enrolled in an experiment with the following settings:

  • 15% of editors who use the feature
  • 50/50 split between variant A and variant B

The editor gets assigned to variant B and we store this assignment in session cookie for easy reference (rather than calculating it fresh on every page load).

The editor closes the browser and starts a new session. While they were away, we modified the experiment's settings to:

  • 10% of editors who use the feature
  • 60/40 split between variant A and variant B

The editor did not use their browser's "restore previous session" feature so their session cookies were cleared out. We now need to make the determination again. Uh oh.

First, the probability that this editor is going to stay enrolled in the experiment has decreased. Second, if the editor is once again enrolled in the experiment they are now more likely to be assigned to variant A.

So the only way that we ensure consistency of assignment output sans backend store is by locking all the inputs.

So the only way that we ensure consistency of assignment output sans backend store is by locking all the inputs.

I think this is implicit in you comment but I want to make it explicit:

Let's assume that we have a backing store that can handle one row per session (~200M rows) with a lookup per pageview (~6000/s) so that we can store all assignments for all users. We're still limited to storing a logged-out user's token (session or otherwise) on the client, be it in a cookie, sessionStorage, or localStorage. If the chosen store is cleared out for any reason, then they are a effectively a new user with a new assigment.

So the only way that we ensure consistency of assignment output sans backend store is by locking all the inputs.

Yes. This is also true if you have no control over the lifetime of the user token.

Just checked with @phuedx and we're aligned on the terminology:

By "inputs" we both mean the inputs/parameters going into the algorithm/function that deterministically outputs a decision. Those inputs include a token and configuration variables like sampling rate.

By "locking" we both mean not allowing those inputs to change to the best of our ability – that is, preventing modification of the experiment's configuration because otherwise if given the same token but different sampling rate it would result in a different decision than before.

@phuedx asked "Is locking all of the inputs acceptable?"

Yes, and in absence of a memory it is also necessary.

I think the only way we could allow modifying an in-progress experiment is if we maintained a record of decisions (whether a given a token was selected to be in the experiment and which of the experimental group it was assigned to).

Without maintaining such a record, we want to ensure that we consistently make the same decision for the same token. This requires keeping all of the variables going into producing a decision to be locked.

Therefore, it should only be possible to modify an experiment's sampling rate before it has started. (But extending an in-progress experiment's end date should be okay, as it would not be an input.)

I've tried to capture our discussion in https://wikitech.wikimedia.org/w/index.php?title=Metrics_Platform%2FSampling#Experiment_Enrolment_Sampling (the page was moved from Sampling Units). Please review the section and be bold if you spot a mistake.

I've taken a look at it and couldn't find anything to add or change. Moving it to "done". Maybe @mpopov wants to have the pleasure of closing this?

Looks great, thank you @phuedx!