Page MenuHomePhabricator

[GrowthBook Integration] Replace run_poller with Test Kitchen and GrowthBook source config variables
Open, HighPublic2 Estimated Story Points

Description

Background

During the production deployment of release tag v1.3.9 which includes the later refactor of the validator/adapter/endpoints that builds the experiments API responses from a new experiments_cache table, it became evident that we needed a way to provide responses that could include/exclude either source - TK or GB. By relying on a coarse flag like run_poller which is disabled for TK production, and thereby rendering the experiments_cache table empty, the TK API experiments responses returned empty body responses for a short time (~10 minutes during which time current active experiment enrollments were disrupted) before the rollout was reverted and the API responses were restored.

Screenshot 2026-06-02 at 8.57.29 PM.png (2,800×2,114 px, 546 KB)

The rollout of v1.3.9 to TK staging coincided with run_poller being enabled in the staging environment. We wanted to keep the poller disabled on TK production while pushing all the new code for the GrowthBook integration but realized that the current configuration variable run_poller doesn't allow for more granular control over what experiment sources can be included in the experiments responses.

By replacing run_poller with include_tk and include_gb, we can exclude GrowthBook experiments from the TK API experiments endpoints and continue to serve TK-registered experiments using the new adapters for building the responses in a case like a production deployment where we do not want to include GrowthBook experiments until that functionality is ready.

Description

Currently the configuration service always stitches TK-registered and GrowthBook-sourced experiments together with GB winning on identical slug conflicts. The poller's run_poller config variable is a crude on/off switch. There is no way to enable the poll cycle while excluding one source.

This is disruptive/precarious for the following common and anticipated cases:

  • Deploying the refactored TK backend code to production with TK-registered experiments only until GrowthBook-sourced experiments are ready to be included while testing that the iterations of the new experiment configuration services don't break active experiments registered in TKUI.
  • Bringing GrowthBook-sourced experiments into the TK API experiments responses alongside TK-registered experiments
  • Temporarily falling back to TK only if GrowthBook is misbehaving without disabling the poll cycle entirely.
  • Eventually excluding TK-registered experiments from the responses once GrowthBook-sourced experiments are confirmed to be working as expected.

Replace growthbook.run_poller with two independent boolean flags under the same config block:

  • growthbook.include_tk: when true, TK-registered experiments contribute to the stitched set.
  • growthbook.include_gb: when true, GrowthBook-sourced experiments contribute to the stitched set.

stitch() reads both flags and:

  • returns [] when both are false
  • contributes only the enabled side(s) otherwise
  • preserves the existing "GB wins on slug conflict" rule when both are true

The poller's start gate should no longer check run_poller. It requires only api_url and api_key. The include flags are the experiment source configuration variables.

For visibility, the experiments endpoints log which sources contributed on every successful response. The "both false" case warns instead of info.

Acceptance Criteria

  • config.dev.yaml, config.prod.yaml, config.local.yaml use the new flags: both true in dev, include_gb false in prod; local can be either)
    • these values should be set in the Kubernetes chart/helm file accordingly (both true in staging, include_gb false in production)
  • run_poller removed everywhere (config, code, tests)
  • stitch() checks the flags - existing GB-wins behavior preserved when both flags are true
  • Endpoint logs specify the included sources
  • Relevant tests updated

Event Timeline

cjming triaged this task as High priority.Wed, Jun 3, 3:17 AM
cjming set the point value for this task to 2.
cjming updated Other Assignee, added: Sfaci.

Change #1297249 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/deployment-charts@master] test-kitchen: Update chart to add new config properties

https://gerrit.wikimedia.org/r/1297249

Change #1297251 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/deployment-charts@master] Test Kitchen UI: Deploy v1.4.0 release to staging

https://gerrit.wikimedia.org/r/1297251

Change #1297249 merged by jenkins-bot:

[operations/deployment-charts@master] test-kitchen: Update chart to add new config properties

https://gerrit.wikimedia.org/r/1297249

Change #1297251 merged by jenkins-bot:

[operations/deployment-charts@master] Test Kitchen UI: Deploy v1.4.0 release to staging

https://gerrit.wikimedia.org/r/1297251

Change #1299575 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/deployment-charts@master] Test Kitchen UI: Deploy v1.4.1 release to production

https://gerrit.wikimedia.org/r/1299575

Change #1299575 merged by jenkins-bot:

[operations/deployment-charts@master] Test Kitchen UI: Deploy v1.4.1 release to production

https://gerrit.wikimedia.org/r/1299575