Page MenuHomePhabricator

Journal xLab A/B test / instrument configuration changes
Closed, ResolvedPublic5 Estimated Story Points

Description

As an incident responder and data practitioner, I want to be able to understand how configuration changes may have played a role in observed system behavior, so that we can treat, report upon, and hopefully prevent problems. (Separately, it's possible sometimes that it can be useful to understand why something positive was observed when there are interacting variables, although let's assume that's potentially less common in this context.)

It's hard to understand if there have been changes that have accrued for A/B tests (and in principle the same could be said of instruments) defined by xLab users, and what they were. For operators, most of the time this doesn't matter, in the sense that what counts most is knowing what configuration is in effect at the present moment if witnessing problematic symptoms. However, stuff happens, and it's useful to be able to understand more deeply in the case that there have been mutations in configuration and how they play a role in that stuff happening.

The journal of changes doesn't need to be glamorous, it just needs to be functional. Imagine if an operator sees that changes happened in the SAL according to logging from xLab. That's good, and in order to understand what those changes were, they may very well need to go back to examine a previous version or two of the configuration and triangulate against the code that was in the system at the corresponding times.

Acceptance criteria:

  • Journal in a separate table the state of the instrument/experiment object as JSON for the create/update/delete actions for the xLab application.

It should be noted that journaling to a place where the configuration could be JOINed on for other data could be useful as well, although this isn't the main point here. This is more about the ability to manually piece things together for now. Although that would be nice to have, it could be a future thing if we determine we really want it. Often the instrumentation itself produces data reliably enough such that we can draw appropriate inferences on data in concert with other tables accessible via the data lake.

Database

SQL in the patch will need to be manually deployed.

Out of Scope

  • UI visualization is out of scope
  • On 16-September-2025 we discussed that, down the road, we may want to also push the JSON to a Kafka topic for other purposes, for example to synchronize configuration elsewhere. Consider for example GrowthBook becoming a recipient of changes, at least for the period during which GrowthBook is acting mostly as a data analysis engine, which is prior to the period when GrowthBook (potentially) becomes authoritative for configuration. It should be noted that it could also be the case that, eventually, xLab could become the recipient of well-formed JSON upon which xLab may perform a row create/update/delete (or some set of them) for this same sort of record as well - this is concretely probably the point at which GrowthBook (potentially) becomes authoritative for configuration and so xLab needs to act upon changes (n.b., it may very well be the case that xLab polls GrowthBook instead of using an event-driven flow; or it may be a poll in response to an event...this is a point to be explored later on).

Pointer: https://gitlab.wikimedia.org/repos/data-engineering/mpic/-/blob/1d31f32acc8d39835987334c211fdb98c666c49b/controller/instrumentController.js

Event Timeline

Milimetric moved this task from Incoming to READY TO GROOM on the Test Kitchen board.
Milimetric added a project: OKR-Work.
JVanderhoop-WMF set the point value for this task to 5.
JVanderhoop-WMF moved this task from READY TO GROOM to Backlog on the Test Kitchen board.

@dr0ptp4kt
I have been taking a look at the related patch and it looks good and works as expected. But I was wondering if we should discard progress from the journal. It's a property that is being included but it's not part of the instrument data. It's a actually a calculated value that is generated and included in findInstrument(slug) to be able to show the progress in the UI. And we are discussing that idea in https://phabricator.wikimedia.org/T408233#11329645 (I have proposed to remove it)

If we decide to remove it, it will disappear from the journal accordingly. But, in the case we decided to keep it for the UI, should we remove it from the journal explicitly? Should we do that anyway?

Milimetric updated Other Assignee, added: dr0ptp4kt.
Milimetric updated Other Assignee, removed: dr0ptp4kt.

@dr0ptp4kt
I have been taking a look at the related patch and it looks good and works as expected. But I was wondering if we should discard progress from the journal.

Thanks. Okay, I updated it to remove progress if it's present. Thanks again!

Change #1202770 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/deployment-charts@master] Test Kitchen: Deploying v1.1.1 release to staging

https://gerrit.wikimedia.org/r/1202770

Change #1202772 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[operations/deployment-charts@master] Test Kitchen: Deploying v1.1.1 release to production

https://gerrit.wikimedia.org/r/1202772

Change #1202770 merged by jenkins-bot:

[operations/deployment-charts@master] Test Kitchen: Deploying v1.1.1 release to staging

https://gerrit.wikimedia.org/r/1202770

Change #1202772 merged by jenkins-bot:

[operations/deployment-charts@master] Test Kitchen: Deploying v1.1.1 release to production

https://gerrit.wikimedia.org/r/1202772