Page MenuHomePhabricator

[Spike] Explore Generalizing Enrollment Authorities
Closed, ResolvedPublic5 Estimated Story Points

Description

 Scope

  • What would generalized enrolment authority look like?
  • What are its enrolment authority in relation to a different enrolment authority?
  • What would an API look like for enrolment on a custom token?
  • What are the responsibilities and what should it do?
  • Is non cache splitting just another enrolment authority?

Obvious Gotchas

  • If Test Kitchen enrols people automatically on global identifier, what does it mean to enrol based on local identifier? How do we represent that?
  • Don't be influenced by what's currently written - what is the ideal state and how do we move backwards from that?
  • Conceptual doozie

Outcome

What do they do?
  1. An experiment enrollment sampling authority ensures that all group assignments are equally likely for all experiments
    1. For example, if there are two experiments running, both with control and treatment groups, then the user should be enrolled into the experiments in such a way that the following experimental group assignments are equally likely:
Experiment 1Experiment 2
controlcontrol
controltreatment
treatmentcontrol
treatmenttreatment
  1. An EESA consistently assigns the same user to the same experimental group for the same experiment
What don't they do?
  1. Manage the identifiers that represent the users
  2. Coordinate the enrollment of a user into multiple experiments, which may or may not require different identifiers
  3. Decorate the output with metadata about the enrollment of the user into experiments

(1) may or may not be performed by an external component. (2) and (3) will be performed by the Coordinator and Decorator components, respectively.

Note well that (2) and (3) will be performed automatically to enable CSS-only experiments. CSS-only experiments are those that only vary the style of a feature based on CSS classes added to the <body> tag by Test Kitchen.

Example 1: T405074 xLab: Allow user re-enrollment at specific times

Test Kitchen enrolls the user in experiments that require the mw-user identifier type in the BeforeInitialize hook using the central user ID. Unfortunately, the Growth team discovered that the central user ID isn't available to BeforeInitialize hook handlers during the account creation flow. In order to work around this limitation, the Growth team reimplemented the enrollment coordination and output decoration steps. They had to do this because Test Kitchen doesn't separate EESAs from the enrollment sampling coordination and output decoration steps.


Recommendation #1: Defer Output Decoration

Currently, the output decoration step is performed immediately after the experiment enrollment sampling step in the BeforeInitialize hook handler. It could and should be performed much later in the BeforePageDisplay hook handler. This is consistent with (3) in § What don't they do? above.


Recommendation #2: Allow Identifiers to be Updated

In order to handle the cases where an identifier type isn't available or has been invalidated, the experiment enrollment coordinator should allow identifiers to be updated. Unfortunately, hook handlers are called in the order in which they are registered so we should provide an API.

PHP

namespace MediaWiki\Extension\MetricsPlatform\TestKitchen;

interface Coordinator {
	const IDENTIFIER_TYPE_EDGE_UNIQUE = 'edge-unique';
	const IDENTIFIER_TYPE_MW_USER = 'mw-user';
	const IDENTIFIER_TYPE_SEARCH_SESSION = 'search-session';

	/**
	 * Updates the identifier.
	 * 
	 * When an identifier is updated all experiments that require the
	 * identifier type are enrolled and the `XLab.ExperimentManager` service is
	 * updated with those enrollments. If this is done before the
	 * `BeforePageOutput` hook is run, then the output is decorated with those
	 * enrollments also.
	 *
	 * @throws \DomainException If the identifier type can't be updated, i.e.
	 *  it's {@link Coordinator::IDENTIFIER_TYPE_MW_USER}
	 */
	public function updateIdentifier( string $identifierType, string $identifier );
}

JavaScript

enum IdentifierType {
	EDGE_UNIQUE = 'edge-unique',
	MW_USER = 'mw-user',
	SEARCH_SESSION = 'search-session'
}

interface Coordinator {

	/**
	 * Updates the identifier.
	 *
	 * When an identifier is updated all experiments that require the
	 * identifier type are enrolled and the body is decorated with CSS classes
	 * corresponding to those enrollments.
	 *
	 * @throws Error If the identifier type can't be updated, i.e.
	 *  it's {@link IdentifierType.EDGE_UNIQUE} or
	 *  {@link IDENTIFIER_TYPE.MW_USER}
	 */
	updateIdentifier(
		identifierType: IdentifierType,
		identifier: string
	): void;
}

Recommendation #3: Generalize Output Decoration

It's conceivable that Test Kitchen could decorate the output of the MediaWiki Action and REST APIs. Test Kitchen should select and use the appropriate output decorator.


Quoting T405074:

Another use cases for Growth that has come up is the retrieval of an assigned group within an action API or rest request.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

  • Experiment assigned group retrieval on account creation
  • Experiment assigned group retrieval on API calls
  • Experiment assigned group retrieval on maintenance scripts

Uses cases (1) and (2) are covered by recommendations (1), (2), and (3) above. However, those recommendations assume that the MediaWiki application is responding to an HTTP request, which doesn't hold in use case (3). Maintenance scripts don't have BeforeInitialize and LocalUserCreated hook handlers, they don't have output to decorate, and, crucially, they are userless.


Recommendation #4: Expose Experiment Enrollment Sampling

namespace MediaWiki\Extension\MetricsPlatform\TestKitchen;

interface Coordinator {

	/**
	 * Gets the experimental group assignment for the identifier.
	 *
	 * This method is pure – it has no side effects.
	 *
	 * @return {string|null} If the user is enrolled in the experiment, then
	 *  then the name of the experimental group; otherwise, `null` 
	 */
	public function getEnrollmentForIdentifier( string $experimentName, string $identifier ): ?string;
}
Example 2: Experiments on Search Sessions

In § Example 1 above, we recommended upgrading the experiment enrollment coordinator in Test Kitchen to handle cases the cases where an identifier type isn't available or has been invalidated. Another motivating example for this upgrade is experiments on search sessions via Test Kitchen.

Currently, there are two distinct classes of experiment that can be run on search sessions: (1) experiments on the user interface; and (2) experiments on the backend used by the CirrusSearch MediaWiki extension ("CirrusSearch"). (1) has been handled on a per-experiment basis whereas (2) has been handled by the experiment enrollment coordinator in CirrusSearch. If we follow Recommendation 2 in § Example 1 above, then (1) should be trivial to implement:

// ext.testKitchen/index.js
//
// Maintained by the Experiment Platform team

function onNewSearchSession( searchSessionIdentifier: string ) {
	mw.testKitchen.updateIdentfier( mw.TestKitchen.IdentifierType.SEARCH_SESSION, searchSessionIdentifier );
}

// ---

// ext.wikimediaEvents/experiments/search_experiment_1.js
//
// Not maintained by the Experiment Platform team

function onNewSearchSession() {
	mw.testKitchen.getExperiment( 'my-awesome-search-experiment' )
		.send(
			'exposure',
			{
				action_context: e.getAssignedGroup()
			}
		);
}

However, (2) is more complicated. There are four scenarios that the experiment enrollment coordinator in CirrusSearch handles. The first scenario, "[s]ession starting on-wiki with autocomplete", which is the most common, could also be handled by the example code above, but the remaining three need careful consideration.

 Conclusion

Ostensibly, this spike is exploring a generalized EESA. However, we have explored a handful of real-world examples and found that:

  1. EESAs are already sufficiently general
  2. Test Kitchen has a hidden object, the Experiment Enrollment Coordinator
  3. Experiment implementors have been using the Experiment Enrollment Coordinator
  4. It's possible and desirable to expose the EESA to experiment implementors without comprimising the functionality of the Experiment Enrollment Coordinator

Event Timeline

JVanderhoop-WMF set the point value for this task to 5.
JVanderhoop-WMF moved this task from READY TO GROOM to Backlog on the Test Kitchen board.
JVanderhoop-WMF added a subscriber: phuedx.

TODO:

  1. Review the list of Test Kitchen Adoption Blockers by @JVanderhoop-WMF
phuedx updated the task description. (Show Details)

Thanks Sam!

Regarding CirrusSearch: Do you already have particular open questions/obstacles related to the A/B test scenarios covered in the backend?

  • Session starting by following a link to blank Special:Search
  • Session starting by following a link to Special:Search with a query
  • Session starting at Special:Search with a 'go'

Change #1204815 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/extensions/MetricsPlatform@master] Enable experiment enrollment in the MediaWiki Action API

https://gerrit.wikimedia.org/r/1204815

Regarding CirrusSearch: Do you already have particular open questions/obstacles related to the A/B test scenarios covered in the backend?

  • Session starting by following a link to blank Special:Search
  • Session starting by following a link to Special:Search with a query
  • Session starting at Special:Search with a 'go'

Thanks @pfischer. A couple of questions spring to mind:

  1. Is Scenario 1 treated the same as the other scenarios? In my mental model, a user searching or interacting with the search autocomplete starts a session. Is that correct?
  2. Is the experiment enrollment always performed by the same code path in these scenarios?

Change #1204815 merged by jenkins-bot:

[mediawiki/extensions/MetricsPlatform@master] Enable experiment enrollment in the MediaWiki Action API

https://gerrit.wikimedia.org/r/1204815

Regarding CirrusSearch: Do you already have particular open questions/obstacles related to the A/B test scenarios covered in the backend?

  • Session starting by following a link to blank Special:Search
  • Session starting by following a link to Special:Search with a query
  • Session starting at Special:Search with a 'go'

Thanks @pfischer. A couple of questions spring to mind:

  1. Is Scenario 1 treated the same as the other scenarios? In my mental model, a user searching or interacting with the search autocomplete starts a session. Is that correct?

Autocomplete is the most common way to start the session, but nothing enforces that. I believe the reason it was mentioned in our design docs is that Special:Search includes a js config variable that explicitly says what test the user is enrolled into, if the user lands on Special:Search without previously starting a session we need to pick up that value and use it for the autocomplete/fulltext requests. This is as opposed to autocomplete which must call a separate configuration api to find out the test enrollment decision. I suspect this is not as relevant in xLab.

  1. Is the experiment enrollment always performed by the same code path in these scenarios?

Sort of? The enrollment decision is always made in the CirrusSearch\UserTestingEngine class. The typical entrypoint is via invoking CirrusSearch\UserTestingStatus::getInstance() which happens during search engine initialization. The main variance is how the user gets to that code path. Special:Search, as mentioned above, embeds the enrollment decision into the page java script variables, whereas autocomplete has to invoke a special configuration api, as we don't have a pleasant way to inject custom values into the api responses, and even if we did there are multiple javascript autocomplete implementations and they would all have to be adjusted to give us access to that raw response.

The enrollment decision is always centralized, but how the user gets there varies.

  1. Is the experiment enrollment always performed by the same code path in these scenarios?

Sort of? The enrollment decision is always made in the CirrusSearch\UserTestingEngine class. The typical entrypoint is via invoking CirrusSearch\UserTestingStatus::getInstance() which happens during search engine initialization. The main variance is how the user gets to that code path. Special:Search, as mentioned above, embeds the enrollment decision into the page java script variables, whereas autocomplete has to invoke a special configuration api, as we don't have a pleasant way to inject custom values into the api responses, and even if we did there are multiple javascript autocomplete implementations and they would all have to be adjusted to give us access to that raw response.

FWIW there's been some effort put into centralising some of the API interactions into an RL module in MediaWiki Core.

I had thought that it might be possible to get rid of this API call if we moved the enrollment step into JavaScript. This would allow us to enroll frontend-focussed and backend-focussed experiments on search sessions in the same way and present a unified API (protocol maybe?) to the developer. Having multiple JavaScript autocomplete implementations definitely complicates this but having an RL module in MediaWiki Core makes it a little simpler.

The enrollment decision is always centralized, but how the user gets there varies.

Nicely put!

I don't think that the complications around when to enroll search sessions in experiments highlighted above requires any significant changes to our notion of an Enrollment Authority or the conclusions in the task description. Rather, it means that we need to work closely with the Search team in order to ship an Enrollment Coordinator for search sessions that allows Product teams to experiment on the various autocomplete implementations and the Search team to experiment on the backend. I'll create a new task.