Page MenuHomePhabricator

SDS 2.2.5 Exposure Logging
Closed, ResolvedPublic

Description

SDS 2.2.5 hypothesis:

If we update Test Kitchen JS and PHP SDKs with methods to log experiment exposure, we will not need to treat all events as exposure events, which will improve performance of experiment assignment queries in GrowthBook and yield more accurate experiment results.

This work follows the learnings from SDS 2.2.3 and SDS 2.2.2. In GrowthBook, an experiment assignment query (EAQ) returns which users were part of which experiment, which variation they saw, and when they saw it. It is one half of queries that are used to generate experiment results. GrowthBook uses this data in multiple ways:

  • Experiment health check: to see if anyone was exposed to multiple variations during the experiment
  • More accurate experiment results: when calculating metrics, filter out any data collected from a user prior to their first exposure

None of our experiments log exposure, so we are having to treat all events coming out of experiments as exposure events in the EAQ, which negatively affects GrowthBook's performance. Instead of querying for a small subset of data, EAQ queries all the data.

Furthermore, we are unable to exclude irrelevant data from experiment analysis. For contributors-focused experiments that measure edit rates, we want to ensure that we only count edits made after the user was actually exposed (or would have been exposed) to the treatment being tested. Without exposure logging, we can have users in the experiment who we collect data from but who have never actually been exposed to the treatment (or lack thereof).

Without this work, product teams implementing their experiments would have to manually include exposure events in their instrumentation specifications, and always have to look up the event name they should use. One experiment might use "experiment_viewed", another might use "viewed_experiment", and another might use "exposure" – we would have to look for all of these when writing the experiment assignment query. If the SDK had a logExposure() method, we could control and standardize the exact event name to use.

Furthermore, since GrowthBook determines dimensions using experiment assignment queries, we can actually collect a bunch of contextual attributes just with exposure events (without elevating risk level of the data collection activity) and not with the rest of the event data. For example, if we recorded that the user was logged in at exposure time, we don't need to also include whether they are logged in with every single interaction we record. This would reduce the total size of events in all experiments, improving the overall performance of the system.

Event Timeline

As an example: https://docs.growthbook.io/event-trackers/segment

To track when a user is exposed to an experiment from the client side, you'll want to add a Segment event call to your trackingCallback function. This function is called when a user is exposed to an experiment, and is passed the experiment name and variation name.

const gb = new GrowthBook({
  apiHost: "https://cdn.growthbook.io",
  clientKey: "sdk-abc123",
  trackingCallback: (experiment, result) => {
    // Example using Segment
    analytics.track("Experiment Viewed", {
      experimentId: experiment.key,
      variationId: result.key,
    });
  }
});

@KReid-WMF @phuedx: If Segment (https://github.com/segmentio/analytics-next?) is equivalent to Test Kitchen JS SDK then it looks like the common practice is for developers to manually produce exposure events following some agreed-upon convention (e.g. viewed_experiment vs Experiment Viewed vs experiment_exposure) for such events.

So maybe we don't produce these automatically but do require developers to produce their own, because they would know where to produce them from in their feature code. I'm just worried about relying on developers to remember this, but at the same time I can envision problems with trying to automate it. Particularly getting false positives – logging exposure when there hasn't actually been one.

So maybe we don't produce these automatically but do require developers to produce their own, because they would know where to produce them from in their feature code. I'm just worried about relying on developers to remember this, but at the same time I can envision problems with trying to automate it. Particularly getting false positives – logging exposure when there hasn't actually been one.

We can enforce the requirement but not the timing, i.e. if an event is being sent but an exposure event hasn't yet been sent, then:

  • In production: log an error to the browser console and increment a counter in Prometheus so that we can monitor this behaviour
  • In development: throw an error and stop processing so that unit/integration tests fail

We can enforce the requirement but not the timing, i.e. if an event is being sent but an exposure event hasn't yet been sent, then:

  • In production: log an error to the browser console and increment a counter in Prometheus so that we can monitor this behaviour
  • In development: throw an error and stop processing so that unit/integration tests fail

Hm… I don't think that's the way to do it. Example: Growth's upcoming Revise Tone A/B test that has edit_saved events when an edit is successfully saved. The treatment in that experiment is a module on the Newcomer Homepage, which is where exposure should be logged. When user makes an edit, either because they engaged with a suggestion from that module or otherwise, there is a separately instrumented event.

So in that case, if an enrolled user is editing some random article we should still record any edit_saved events but there won't be exposure events – because the exposure to treatment or lack of treatment (in case of control) only happens when they visit Special:Homepage.

So we want to allow events to be collected from experiments and then apply filtering after the fact (the way GrowthBook does it) – only edit_saved events logged after exposure should count towards calculation of constructive edit rate.

Thought: Experiment#logExposure() and then the user doesn't have to remember if they're supposed to use send( 'experiment_viewed' ) or send( 'exposure' ) or whatever.

Thought: Experiment#logExposure() and then the user doesn't have to remember if they're supposed to use send( 'experiment_viewed' ) or send( 'exposure' ) or whatever.

Yes. Also, experiment exposure should be part of the instrumentation suite.

I thought about this some last night and I'm convinced that we need something like Experiment#logExposure(). We're able to scale experimentation by keeping experiment enrollment stateless – we enroll the user or the device for every pageview for every active experiment. Obviously though, this means that there will be some duplication of events like experiment exposure events and that duplication will inhibit scaling experimentation.

By creating something like Experiment#logExposure() we give ourselves the opportunity to:

  1. Standardise the exposure event, e.g. #send( 'exposure' ) internally
  2. Store whether an exposure event has been sent for the experiment and simply not send it if we don't think that we have to

Thinking about how Editing implemented their section editing dead end experiment, with instrumentation setting:

experimentPromise.then( ( experiment ) => {
	if ( !( experiment && experiment.isAssignedGroup( 'control', 'treatment' ) ) ) {
		return;
	}
	// The user is definitely enrolled in an existing experiment by this point
	const config = mw.config.get( 'wgVisualEditorConfig' ) || {};
	config.enableSectionEditingFullPageButtons = experiment.isAssignedGroup( 'treatment' );
} );

and then config.enableSectionEditingFullPageButtons is used in feature code to determine whether to show feature:

if ( this.enableSectionEditingFullPageButtons ) {
	if ( attachedRootRange.start !== 0 && OO.ui.isMobile() ) {
		surface.getView().$element.prepend( this.$switchToFullPageContainerTop );
	}
	if ( attachedRootRange.end < documentRange.end ) {
		surface.getView().$element.append( this.$switchToFullPageContainerBottom );
	}
}

I believe what we're proposing here would require them to use logExposure in the feature code, which is where they should have used experiment.isAssignedGroup( 'treatment' ). I think it's worth examining why they didn't – to not create a soft dependency on Metrics Platform (soon to be Test Kitchen) extension in VisualEditor extension?

I think ideally their feature code would look something like:

if ( this.enableVisualSectionEditing && this.section !== null ) {
	const surfaceModel = surface.getModel();
	const attachedRootRange = surfaceModel.getAttachedRoot().getOuterRange();
	const documentRange = surfaceModel.getDocument().getDocumentRange();
+	sectionEditingDeadEndExperiment.logExposure();
-	if ( this.enableSectionEditingFullPageButtons ) {
+	if ( sectionEditingDeadEndExperiment.isAssignedGroup( 'treatment' ) ) {
		if ( attachedRootRange.start !== 0 && OO.ui.isMobile() ) {
			surface.getView().$element.prepend( this.$switchToFullPageContainerTop );
		}
		if ( attachedRootRange.end < documentRange.end ) {
			surface.getView().$element.append( this.$switchToFullPageContainerBottom );
		}
	}
} else {
	this.$switchToFullPageContainerTop.detach();
	this.$switchToFullPageContainerBottom.detach();
}

because that's the point where user is either shown the treatment or not shown the treatment.

David L. shared the following thoughts in Slack:

Yeah, it was 100% a dependency issue. VisualEditor is part of the core mediawiki distribution, and so we can’t really assume that this stuff exists. i.e. unless MetricsPlatform gets added to the bundled extensions, that approach is a non-starter.

It’s also less-ideal because it guarantees that we’ll need to change that code later when we actually release the feature. As-written, we can just turn off the experiment and deploy a config change to turn it on for everyone. (And then clean up the existence of the config value later, if we decide we don’t need it.)

Also, in terms of exposure-logging, as mobileSectionSwitch.js is currently written, your suggested change won’t particularly alter when isAssignedGroup (/ logExposure) is being called. (If we just stuck the logExposure call onto line 34 of mobileSectionSwitch.js, I mean.)

It’s functionally identical to the existing logging of the init event in that file, I think?

Which sounds right. So maybe something like:

mw.hook( 've.newTarget' ).add( ( target ) => {
	if ( target.constructor.static.trackingName !== 'mobile' ) {
		return;
	}
	experimentPromise.then( ( exp ) => {
		if ( !( exp && exp.isAssignedGroup( 'control', 'treatment' ) ) ) {
			return;
		}
		const send = ( action, data ) => {
			data.funnel_entry_token = editingSessionService.getEditingSessionId( null, true );
			data.action_context = data.action_context || {};
			data.action_context.interface = target.getDefaultMode() === 'source' ? 'wikitext-2017' : 'visualeditor';
			// This needs to be a string, but we've left it as an object until
			// now so it can be easily modified:
			data.action_context = JSON.stringify( data.action_context );
			exp.send( action, data );
		};
		const timings = {
			init: mw.now()
		};
+		exp.logExposure();
		send( 'init', {
			action_subtype: target.section !== null ? 'section' : 'page',
			page: {
				namespace_id: mw.config.get( 'wgNamespaceNumber' )
			}
		} );

The dependency point makes me wonder if, in addition to being able to log exposure directly via the SDK, a mw.hook( 'testKitchen.experimentExposure' ).add/fire pattern would be useful or appropriate here.

David L. shared the following thoughts in Slack:

It’s also less-ideal because it guarantees that we’ll need to change that code later when we actually release the feature. As-written, we can just turn off the experiment and deploy a config change to turn it on for everyone. (And then clean up the existence of the config value later, if we decide we don’t need it.)

I think it's realistic to expect some code be introduced for an experiment and therefore optimise for that code to be removed with minimal effort.

Which sounds right. So maybe something like: <snip />

I think you're intuition to introduce a protocol for folks is a good one but, since it would require implementors to add code to their extension, I'm not sure that it has an advantage over the current pattern of executing code when the ext.testKitchen module is loaded (and not when it isn't).

mpopov renamed this task from Experiment exposure events to [EPIC] SDS 2.2.5 Experiment exposure events.Jan 13 2026, 4:25 PM
mpopov claimed this task.
mpopov triaged this task as Medium priority.
mpopov renamed this task from [EPIC] SDS 2.2.5 Experiment exposure events to SDS 2.2.5 Experiment exposure events.Jan 13 2026, 4:28 PM
mpopov renamed this task from SDS 2.2.5 Experiment exposure events to SDS 2.2.5 Exposure Logging.Jan 23 2026, 8:24 PM

From the hypothesis close-out report in Asana:

Test Kitchen's JS and PHP SDKs now have an Experiment#sendExposure() method which developers should use to log exposure in their experiments. This method:

  • Standardizes the name of exposure events, so developers don't have to remember the exact name of the exposure event when implementing their experiments.
  • Collects a curated set of contextual attributes, separate from the experiment's/stream's configuration, to enable dimensional breakdown in GrowthBook when viewing experiment results.
  • Minimizes the volume of exposure sent per experiment using a 2-tier ledger in JS SDK and a 1-tier ledger in PHP SDK, reducing bandwidth and storage consumed by experiments.
  • Will soon be required for experiments to be eligible for analysis in GrowthBook.

Developers are now instrumented to use this new method as part of our guide on conducting experiments: https://wikitech.wikimedia.org/wiki/Test_Kitchen/Conduct_an_experiment#Exposure_logging

Developers can consult the new guide https://wikitech.wikimedia.org/wiki/Test_Kitchen/Experiment_exposure_logging which includes many examples and recommendations for how to best log exposure to maximize true positives (exposure logged when there actually was exposure) and minimize false positives (exposure logged when there wasn't any exposure) and false negatives (failing to log exposure when there actually was exposure).

NOTE: We will need to wait until the Mobile ToC A/B/C test has concluded before we can make the switch/requirement, as we do not want to disrupt that experiment's analysis while it's still in progress. We expect to be able to do this by end of April.

P.S. A few contributor-focused metrics in automated analytics MVP like constructive edit rate and constructive activation rate already depend on experiment exposure events as part of their metric definition, but until this hypothesis developers had to implement those themselves – e.g. write send( 'experiment_exposure' ) rather than sendExposure().