SDS 2.2.5 hypothesis:
If we update Test Kitchen JS and PHP SDKs with methods to log experiment exposure, we will not need to treat all events as exposure events, which will improve performance of experiment assignment queries in GrowthBook and yield more accurate experiment results.
This work follows the learnings from SDS 2.2.3 and SDS 2.2.2. In GrowthBook, an experiment assignment query (EAQ) returns which users were part of which experiment, which variation they saw, and when they saw it. It is one half of queries that are used to generate experiment results. GrowthBook uses this data in multiple ways:
- Experiment health check: to see if anyone was exposed to multiple variations during the experiment
- More accurate experiment results: when calculating metrics, filter out any data collected from a user prior to their first exposure
None of our experiments log exposure, so we are having to treat all events coming out of experiments as exposure events in the EAQ, which negatively affects GrowthBook's performance. Instead of querying for a small subset of data, EAQ queries all the data.
Furthermore, we are unable to exclude irrelevant data from experiment analysis. For contributors-focused experiments that measure edit rates, we want to ensure that we only count edits made after the user was actually exposed (or would have been exposed) to the treatment being tested. Without exposure logging, we can have users in the experiment who we collect data from but who have never actually been exposed to the treatment (or lack thereof).
Without this work, product teams implementing their experiments would have to manually include exposure events in their instrumentation specifications, and always have to look up the event name they should use. One experiment might use "experiment_viewed", another might use "viewed_experiment", and another might use "exposure" – we would have to look for all of these when writing the experiment assignment query. If the SDK had a logExposure() method, we could control and standardize the exact event name to use.
Furthermore, since GrowthBook determines dimensions using experiment assignment queries, we can actually collect a bunch of contextual attributes just with exposure events (without elevating risk level of the data collection activity) and not with the rest of the event data. For example, if we recorded that the user was logged in at exposure time, we don't need to also include whether they are logged in with every single interaction we record. This would reduce the total size of events in all experiments, improving the overall performance of the system.