Page MenuHomePhabricator

Log user enrollment
Open, Needs TriagePublic

Description

We need to record when a user is enrolled in an experiment in order to know the total number of users enrolled in the experiment and also the rate at which users were enrolled. We could:

  1. Store enrollment status (id, date, bucket) in the user_options table
  2. Submit an Event Platform event to a stream and store those events in a Hive table

The user_options table is familiar and available to both developers and analysts. However, there's no equivalent for logged-out users, the next logical user group that we wish to enable experimentation for. Further, since experiment enrollment will be deterministic, the Enrollment Log can be write-once-read-never (WORN). We can safely discard option 1.

Using the Event Platform, therefore, is the solution. However, it also has limitations. Using it necessarily increases pressure on Varnish, EventGate, and the network. Submitting an enrollment event per pageview would result in us DDoSing ourselves. Submitting > 1000 events/sec per event stream requires special attention from Data Engineering. Given that we get ~6000 pageviews/sec on average, unless we special case the Enrollment Log, we're limited to experimenting on ~18% of all pageviews. Further, there are 215MM sessions per day, we're limited to experimenting on ~40% of all sessions.

Prior Art

  1. Growth's ExperimentUserManager service class: https://gerrit.wikimedia.org/g/mediawiki/extensions/GrowthExperiments/+/85114e98ce8d59b9475518538009edcddd5eb23f/includes/ExperimentUserManager.php
  2. Readers Web's webABTestEnrollment instrument:

Notes

  1. We don't necessarily need to record which user saw which treatment. We could use a unique token that represents the user across the lifetime of the experiment, e.g.
$uniqueUserID = hash( 'sha256', implode( ':', [ $userID, $experimentID ] );