Page MenuHomePhabricator

Track k8s deployments that runs unreleased changes to the charts
Open, In Progress, Needs TriagePublic

Description

Problems to solve:

  1. Detecting homedir deploys
  2. Detecting unreleased config tweaks
  3. Detecting whether there are unreleased changes, or the deployment running previous version

We have similar one-off for admin_ng, see script and timer, T331894.

Event Timeline

atsuko changed the task status from Open to In Progress.EditedApr 23 2026, 8:47 AM
atsuko added a subscriber: RLazarus.

Wrapping my head around @RLazarus charlie.py, as it is "recursively searches out helmfile services in a given repository or subtree, identifies all the environments for each service, and diffs or applies them by shelling out to helmfile". It seems like the helmfile discovery is solved here, and it runs periodically, so it would be useful to make it generate metrics.

For all purposes, running charlie --dry_run --services_dir admin_ng diff produces the same information as check_admin_ng_pending_changes.py when invoked for all environments. I'm considering

  1. making it a library
  2. adding prometheus exporter to expose error code and maybe the chart versions

Code and history is copied to gitlab/sre/charlie, now there are two commands:

  1. /usr/bin/charlie that works like previous /usr/local/bin/charlie,
  2. /usr/local/bin/charlie-prom --services_dir admin_ng that inherits functionality of check_admin_ng_pending_changes.py.

Change #1277471 had a related patch set uploaded (by Atsuko; author: Atsuko):

[operations/puppet@production] deployment_server: move charlie/admin_ng to debian package

https://gerrit.wikimedia.org/r/1277471

Change #1277471 merged by Atsuko:

[operations/puppet@production] deployment_server: move charlie/admin_ng to debian package

https://gerrit.wikimedia.org/r/1277471

Rolled out the replacement script, thanos helmfile_admin_ng_pending_changes didn't change. Going to roll it out for dse services as well now.