Page MenuHomePhabricator

Rollout for asynchronous instrument configs fetching
Closed, ResolvedPublic2 Estimated Story Points

Description

Background

We recently refactored instrument configs fetching to make updating configs asynchronous which is riding the train this week.

Also this week, Moderator Tools is running a logged-in experiment that is scheduled to start 2025-08-06.

Description

There are a few steps we need to sequence in order to make sure events are not dropped for our partner product team as we transition from the previous method of instrument configs fetching to the asynchronous new way.

Here are a few things to keep in mind as we make a rollout plan:

  • The wikis in the Moderator Tools experiment are all group2 wikis - the train cut for this group is 2025-08-07 UTC 18:00
  • The asynchronous instrument configs fetching patch was merged on 2025-07-30 and will ride the train this week - it will roll out to group2 wikis on 2025-08-07 UTC 18:00
  • The asynchronous instrument configs fetching patch requires a puppet change to be deployed that runs a periodic maintenance script to update configs
  • The puppet change needs to be deployed by an SRE << and this deployment needs to be scheduled after the asynchronous instrument configs patch rolls out to group2 because it relies on code in the master branch.
  • In order to pre-empt not having data loss for the Moderator Tools experiment, we need to deploy a config patch to include the group2 wikis in $wgMetricsPlatformEnableExperimentConfigsFetching to ensure data collection for the current (old) way of configs fetching and because the xLab onBeforeInitialize hook checks for this config var before calling updateExperimentConfigs(). If we deploy this config change on 2025-08-06 before the asynchronous patch goes to group2 on 2025-08-07, the Moderator Tools experiment's data collection should in theory be seamless - old way uses one set of cache keys while the asynchronous new way uses a different set of cache keys.
  • Ideally, because of deployment synchronization issues, let's inquire with Moderator Tools to see if we can punt the start date to 2025-08-11

Rollout Plan:

Test Plan:

  • run a logged-in experiment on any wiki
  • verify data collection looks good

In theory, there should not be data loss for the Moderator Tools logged-in experiment based on reviewing the code path for updating configs as long as the correct config vars are in production.

Event Timeline

Change #1175561 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/mediawiki-config@master] Temporarily add config var back in for group2

https://gerrit.wikimedia.org/r/1175561

@dr0ptp4kt @phuedx there's a puppet request window on Thursday 8/7 UTC 16:00 which is 2 hours before the train cut of 1.45.0-wmf.13 to group2 which is when the asynchronous instrument configs fetching patch is expected to be available for group2 wikis - is this too early or will it be fine to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1171205 then?

I guess we can request a custom window right after the train cut to group2? We just need to schedule ahead if this is the preferred time to deploy.

The time slot of the Puppet request window is typically twice weekly, but this can shift (in advance) to accommodate other deployments.

@dr0ptp4kt @phuedx there's a puppet request window on Thursday 8/7 UTC 16:00 which is 2 hours before the train cut of 1.45.0-wmf.13 to group2 which is when the asynchronous instrument configs fetching patch is expected to be available for group2 wikis - is this too early or will it be fine to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/1171205 then?

I think it's too early, because https://gerrit.wikimedia.org/r/c/operations/puppet/+/1171205/4/modules/profile/manifests/mediawiki/maintenance/experimentationlab.pp#10 won't be able to reach the code path in Extension:MetricsPlatform/maintenance/UpdateConfigs.php until that UpdateConfigs.php is deployed to production (it was only introduced in r1172066).

'/usr/local/bin/mwscript extensions/MetricsPlatform/maintenance/UpdateConfigs.php --wiki aawiki',

I guess we can request a custom window right after the train cut to group2? We just need to schedule ahead if this is the preferred time to deploy.

The time slot of the Puppet request window is typically twice weekly, but this can shift (in advance) to accommodate other deployments.

Yeah, I think that's preferable.

Change #1175561 abandoned by Clare Ming:

[operations/mediawiki-config@master] Temporarily add config var back in for group2

Reason:

per today's discussion, we no longer need this config

https://gerrit.wikimedia.org/r/1175561

Milimetric claimed this task.