While building the dumps v1 on airflow prototype, we have hardcoded the docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli image tag to use in the dumps pods. However, the long-term solution is to have the mediawiki-dumps-legacy helmfile release deployed by scap every time Mediawiki is deployed. This way, the image tag would be part of the Job template itself, and wouldn't have to be maintained in the airlfow DAGs.
Description
Details
| Title | Reference | Author | Source Branch | Dest Branch | |
|---|---|---|---|---|---|
| test_k8s/dumps: fetch the dump pod spec from a CronJob | repos/data-engineering/airflow-dags!1409 | brouberol | T389786 | main | |
| test_k8s/dumps: stop hardcoding the mediawiki image name and tag | repos/data-engineering/airflow-dags!1356 | brouberol | T389786 | main |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T88728 Improve Wikimedia dumping infrastructure | |||
| Resolved | BTullis | T352650 WE 5.4 KR - Hypothesis 5.4.4 - Q3 FY24/25 - Migrate current-generation dumps to run on kubernetes | |||
| Resolved | brouberol | T388378 Orchestrate dumps v1 from an airflow instance | |||
| Resolved | brouberol | T389786 Integrate mediawiki-dumps-legacy with the regular MW scap deployments |
Event Timeline
Change #1130683 had a related patch set uploaded (by Btullis; author: Btullis):
[operations/puppet@production] [WIP] Configure a scap deployment of mediwiki-dumps-legacy
Ah, I think that we are blocked on T389499: Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds before we can do this. Although we could add an entry to the profile::kubernetes::deployment_server::mediawiki::release::mw_releases structure in hiera, we cannot yet configure it to deploy the mediawiki-cli version of the image.
Change #1130683 merged by Brouberol:
[operations/puppet@production] Configure a scap deployment of mediwiki-dumps-legacy
We now have the following entry under /etc/helmfile-defaults/mediawiki-deployments.yaml:
- namespace: mediawiki-dumps-legacy releases: production: deploy: false mw_flavour: publish-81 mw_kind: cli-image dir: dse-k8s-services web_flavour: webserver
Scap does not yet fully automatically deploy mediawiki-dumps-legacy, but it should be easy to do so, as the release does not manage any active pod.
Change #1148203 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/puppet@production] deployment_server: deploy the mediawiki-dumps-legacy scap target
Change #1148203 merged by Scott French:
[operations/puppet@production] deployment_server: deploy the mediawiki-dumps-legacy scap target
Change #1151758 had a related patch set uploaded (by Scott French; author: Scott French):
[operations/puppet@production] Revert "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Change #1151758 merged by Scott French:
[operations/puppet@production] Revert "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Change #1151763 had a related patch set uploaded (by Btullis; author: Btullis):
[operations/puppet@production] mediawiki-dumps-legacy: Remove user:group overrides for k8s config
Change #1151763 merged by Btullis:
[operations/puppet@production] mediawiki-dumps-legacy: Remove user:group overrides for k8s config
Change #1151771 had a related patch set uploaded (by Scott French; author: Scott French):
[operations/puppet@production] Revert^2 "deployment_server: deploy the mediawiki-dumps-legacy scap target"
btullis merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1356
test_k8s/dumps: stop hardcoding the mediawiki image name and tag
Change #1151771 merged by Scott French:
[operations/puppet@production] Revert^2 "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Mentioned in SAL (#wikimedia-operations) [2025-06-03T17:19:46Z] <swfrench@deploy1003> Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T388761 T389786
Change #1153316 had a related patch set uploaded (by Scott French; author: Scott French):
[operations/puppet@production] Revert^3 "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Change #1153316 merged by Scott French:
[operations/puppet@production] Revert^3 "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Mentioned in SAL (#wikimedia-operations) [2025-06-03T17:34:07Z] <swfrench@deploy1003> Started scap sync-world: Scap test run after revert - T389786
Mentioned in SAL (#wikimedia-operations) [2025-06-03T17:36:01Z] <swfrench@deploy1003> Finished scap sync-world: Scap test run after revert - T389786 (duration: 02m 10s)
Alas, as foretold in T389499#10671841, you cannot mutate the spec.template of a k8s Job object, regardless of whether it's suspended or not:
Error: UPGRADE FAILED: release production failed, and has been rolled back due to atomic being set: cannot patch "mediawiki-production-dumps-job-template" with kind Job: Job.batch "mediawiki-production-dumps-job-template" is invalid: spec.template: Invalid value: [... %v noise ...] field is immutable
So, while the features added in T388761: scap needs to be k8s-cluster aware do indeed appear to work as expected, they can't really be applied to mediawiki-dumps-legacy as it's currently envisioned.
@BTullis - So, one alternative that comes to mind is to use a CronJob object as your "template" object instead. Although kind of a hack as well (e.g., there are some fiddly bits around having a long-suspended CronJob) at least the spec.template of their spec.jobTemplate is mutable.
Change #1156820 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] mediawiki: convert the dumps Job into a CronJob
Change #1156830 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] mediawiki-dumps-legacy: allow the airflow service account to query CronJob
Change #1156831 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] mediawiki: convert the dumps Job into a CronJob
Change #1156832 had a related patch set uploaded (by Brouberol; author: Brouberol):
[operations/deployment-charts@master] mediawiki-dumps-legacy: drop the batch.Job.get rbac
Change #1156820 merged by Brouberol:
[operations/deployment-charts@master] mediawiki: define a dumps suspended CronJob
Change #1156830 merged by Brouberol:
[operations/deployment-charts@master] mediawiki-dumps-legacy: allow the airflow service account to query CronJob
brouberol merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/1409
test_k8s/dumps: fetch the dump pod spec from a CronJob
Change #1156831 merged by jenkins-bot:
[operations/deployment-charts@master] mediawiki: convert the dumps Job into a CronJob
Change #1156832 merged by jenkins-bot:
[operations/deployment-charts@master] mediawiki-dumps-legacy: drop the batch.Job.get rbac
Change #1159513 had a related patch set uploaded (by Scott French; author: Scott French):
[operations/puppet@production] Revert^4 "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Change #1159513 merged by Scott French:
[operations/puppet@production] Revert^4 "deployment_server: deploy the mediawiki-dumps-legacy scap target"
Mentioned in SAL (#wikimedia-operations) [2025-06-16T17:11:12Z] <swfrench@deploy1003> Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T389786
Mentioned in SAL (#wikimedia-operations) [2025-06-16T17:12:55Z] <swfrench@deploy1003> Finished scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T389786 (duration: 02m 15s)
After the switch to a CronJob, I was able to successfully apply a lingering image diff from today's UTC-afternoon backport window using scap. Thanks for driving that @brouberol!
So, we're now in a state where mediawiki-dumps-legacy will be updated with each scap deployment :)
Thanks @Scott_French, this has been a long time coming, and it's great to see all that work bearing fruit :)