Background
We recently refactored instrument configs fetching to make updating configs asynchronous which is riding the train this week.
Also this week, Moderator Tools is running a logged-in experiment that is scheduled to start 2025-08-06.
Description
There are a few steps we need to sequence in order to make sure events are not dropped for our partner product team as we transition from the previous method of instrument configs fetching to the asynchronous new way.
Here are a few things to keep in mind as we make a rollout plan:
- The wikis in the Moderator Tools experiment are all group2 wikis - the train cut for this group is 2025-08-07 UTC 18:00
- The asynchronous instrument configs fetching patch was merged on 2025-07-30 and will ride the train this week - it will roll out to group2 wikis on 2025-08-07 UTC 18:00
- The asynchronous instrument configs fetching patch requires a puppet change to be deployed that runs a periodic maintenance script to update configs
- The puppet change needs to be deployed by an SRE << and this deployment needs to be scheduled after the asynchronous instrument configs patch rolls out to group2 because it relies on code in the master branch.
- In order to pre-empt not having data loss for the Moderator Tools experiment, we need to deploy a config patch to include the group2 wikis in $wgMetricsPlatformEnableExperimentConfigsFetching to ensure data collection for the current (old) way of configs fetching and because the xLab onBeforeInitialize hook checks for this config var before calling updateExperimentConfigs(). If we deploy this config change on 2025-08-06 before the asynchronous patch goes to group2 on 2025-08-07, the Moderator Tools experiment's data collection should in theory be seamless - old way uses one set of cache keys while the asynchronous new way uses a different set of cache keys.
- Ideally, because of deployment synchronization issues, let's inquire with Moderator Tools to see if we can punt the start date to 2025-08-11
Rollout Plan:
- Deploy the puppet patch after the asynchronous instrument configs fetching patch is on production i.e. after 2025-08-07 UTC 18:00
Test Plan:
- run a logged-in experiment on any wiki
- verify data collection looks good
In theory, there should not be data loss for the Moderator Tools logged-in experiment based on reviewing the code path for updating configs as long as the correct config vars are in production.