Page MenuHomePhabricator

xLab: log create/update/delete of A/B test configuration to SAL
Closed, ResolvedPublic2 Estimated Story Points

Description

As an operator, when xLab configuration is created/updated/deleted, I would like to be able to easily see this change in a place where I look for such information, especially when troubleshooting.

Acceptance criteria:

  1. Ensure that the Server Admin Log ("SAL") contains xLab entries for each configuration mutation that will effectively put an A/B test in force on a particular date or deactivate on a particular date.
  2. Ensure that the entries in the SAL are useful for searching to investigate application behavior.

Related: T397462: xLab: Gracefully handle IdP config missing

Event Timeline

dr0ptp4kt renamed this task from xLab: log create/update/delete of configuration to SAL to xLab: log create/update/delete of A/B test configuration to SAL.Sep 11 2025, 1:40 PM
dr0ptp4kt set the point value for this task to 2.Sep 30 2025, 3:53 PM
dr0ptp4kt moved this task from READY TO GROOM to Backlog on the Test Kitchen board.

This is up and running as a continuous Toolforge job under the tool ID "tk".

https://sal.toolforge.org/analytics

https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log

When one sees (poll 1) that means it was the first time it polled the server for this particular instance of this ever-running job. So, for example, during the system cutover time for experiments starting/ending (based on start/end date for experiments) that happens each day that happened today at 0630 PT (1430 UTC today), it logged:

14:30 wmftkbot: Test Kitchen mw-user experiment (poll 23) - adds: none; removes: synth-aa-test-mw-php; fields: none - xLab/MPIC/TK tips at https://w.wiki/FwuD

That means this was the 23rd polling loop since the job started.

I had to fix the URL for the tool for the IRC identity, so did some job restarts. But hopefully that's it for the day, unless something needs to be tuned a bit more.

The poll counter increases over time. If the job has to be restarted for whatever reason it will go back down to (poll 1). So someone looking at the analytics SAL would just want to be aware that it doesn't imply a bunch of new experiments were enqueued all of a sudden (although that is certainly theoretically possible). That someone should have a look through previous entries.

Members of the tool are able to look at the log files to see git-styled diffs for configurations.

The code presently exists on its own branch.

https://gitlab.wikimedia.org/repos/data-engineering/mpic/-/tree/toolforge

The files are found in the toolforge directory of that branch.

https://gitlab.wikimedia.org/repos/data-engineering/mpic/-/tree/toolforge/toolforge

The bootstrap shell script has the instructions for running and how to set it up on toolforge. One needs to be a member of the tk tool on Toolforge, and upon login at login.toolforge.org needs to do a become tk to land in the tool's home directory for such instructions. For now the Toolforge job is set to run in continuous mode.

TBD whether to have the code in the existing repo or have it on a standalone repo (nb. for the attentive, the build failure on this branch has to do with the branch forking from an older version of the repo, as this code is independent of the other parts of the repo; this can be rebased when we're ready).

Hi @dr0ptp4kt, I have been taking a look at all the work done here and it's really great!

I was only wondering whether this work is intended to replace what we were trying to do by using the mwbot library. We started to implement a way to log a few actions via SAL + mwbot library but other features with higher priorities were prioritized and that thing is not really working. In fact we filed T395839: [SPIKE] Investigate alternatives for mwbot's use of the request library to replace mwbot with other library. But I would say that, once you have implemented all this, we could get rid of the previous way. Even if we wanted to log also when an experiment is registered or removed, apart from when they are activated/deactivated, we could do it using your approach by parsing/comparing the request with format=analytics where all experiments are included in the response regardless of whether they are activated

Hi @dr0ptp4kt, I have been taking a look at all the work done here and it's really great!

Thanks! I think we should double check in a standup if we want to have it in the mpic repo, and then I could tidy the branch up a little (there are a few things I want to tweak, including a couple consistency things on the messages, the quoting of the YAML strings, and probably a CI thing).

I was only wondering whether this work is intended to replace what we were trying to do by using the mwbot library. We started to implement a way to log a few actions via SAL + mwbot library but other features with higher priorities were prioritized and that thing is not really working. In fact we filed T395839: [SPIKE] Investigate alternatives for mwbot's use of the request library to replace mwbot with other library. But I would say that, once you have implemented all this, we could get rid of the previous way. Even if we wanted to log also when an experiment is registered or removed, apart from when they are activated/deactivated, we could do it using your approach by parsing/comparing the request with format=analytics where all experiments are included in the response regardless of whether they are activated

Right. I forgot to say on task what I was thinking, which is I think we can delete the mwbot code and call it good. Sounds okay for you?

Yeah, if we wanted to capture any registration or removal (e.g., because we might want to reach out to folks and offer help, or even just for situational awareness), we could definitely add that as well as you say with the approach you mention; we'd probably want to skip the !log part for IRC (and thus stashbot adding it to the MediaWiki.org analytics SAL) for such registration/deletion, but maybe it could IRC mention some of us...and/or we could set our own IRC keyword notifications (that's what I'm doing in IRCCloud).

Thanks! I think we should double check in a standup if we want to have it in the mpic repo, and then I could tidy the branch up a little (there are a few things I want to tweak, including a couple consistency things on the messages, the quoting of the YAML strings, and probably a CI thing).

I cannot think of something against having it in the mpic repo, in some folder like wmftkbot or something like that (maybe toolforge is a bit generic). I'm not familiar with this toolforge tools so maybe I'm missing some recommendations or best practices

Right. I forgot to say on task what I was thinking, which is I think we can delete the mwbot code and call it good. Sounds okay for you?

That's exactly what I was thinking!

Yeah, if we wanted to capture any registration or removal (e.g., because we might want to reach out to folks and offer help, or even just for situational awareness), we could definitely add that as well as you say with the approach you mention; we'd probably want to skip the !log part for IRC (and thus stashbot adding it to the MediaWiki.org analytics SAL) for such registration/deletion, but maybe it could IRC mention some of us...and/or we could set our own IRC keyword notifications (that's what I'm doing in IRCCloud).

Sounds good!