Page MenuHomePhabricator

Integrate mediawiki-dumps-legacy with the regular MW scap deployments
Closed, ResolvedPublic

Description

While building the dumps v1 on airflow prototype, we have hardcoded the docker-registry.discovery.wmnet/restricted/mediawiki-multiversion-cli image tag to use in the dumps pods. However, the long-term solution is to have the mediawiki-dumps-legacy helmfile release deployed by scap every time Mediawiki is deployed. This way, the image tag would be part of the Job template itself, and wouldn't have to be maintained in the airlfow DAGs.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
test_k8s/dumps: fetch the dump pod spec from a CronJobrepos/data-engineering/airflow-dags!1409brouberolT389786main
test_k8s/dumps: stop hardcoding the mediawiki image name and tagrepos/data-engineering/airflow-dags!1356brouberolT389786main
Customize query in GitLab

Event Timeline

brouberol triaged this task as Medium priority.

Change #1130683 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] [WIP] Configure a scap deployment of mediwiki-dumps-legacy

https://gerrit.wikimedia.org/r/1130683

Ah, I think that we are blocked on T389499: Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds before we can do this. Although we could add an entry to the profile::kubernetes::deployment_server::mediawiki::release::mw_releases structure in hiera, we cannot yet configure it to deploy the mediawiki-cli version of the image.

brouberol changed the task status from Open to In Progress.Apr 16 2025, 2:55 PM
brouberol claimed this task.

Change #1130683 merged by Brouberol:

[operations/puppet@production] Configure a scap deployment of mediwiki-dumps-legacy

https://gerrit.wikimedia.org/r/1130683

We now have the following entry under /etc/helmfile-defaults/mediawiki-deployments.yaml:

- namespace: mediawiki-dumps-legacy
  releases:
    production:
      deploy: false
  mw_flavour: publish-81
  mw_kind: cli-image
  dir: dse-k8s-services
  web_flavour: webserver

Scap does not yet fully automatically deploy mediawiki-dumps-legacy, but it should be easy to do so, as the release does not manage any active pod.

Change #1148203 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/puppet@production] deployment_server: deploy the mediawiki-dumps-legacy scap target

https://gerrit.wikimedia.org/r/1148203

Change #1148203 merged by Scott French:

[operations/puppet@production] deployment_server: deploy the mediawiki-dumps-legacy scap target

https://gerrit.wikimedia.org/r/1148203

Change #1151758 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] Revert "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1151758

Change #1151758 merged by Scott French:

[operations/puppet@production] Revert "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1151758

Change #1151763 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] mediawiki-dumps-legacy: Remove user:group overrides for k8s config

https://gerrit.wikimedia.org/r/1151763

Change #1151763 merged by Btullis:

[operations/puppet@production] mediawiki-dumps-legacy: Remove user:group overrides for k8s config

https://gerrit.wikimedia.org/r/1151763

Change #1151771 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] Revert^2 "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1151771

Change #1151771 merged by Scott French:

[operations/puppet@production] Revert^2 "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1151771

Mentioned in SAL (#wikimedia-operations) [2025-06-03T17:19:46Z] <swfrench@deploy1003> Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T388761 T389786

Change #1153316 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] Revert^3 "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1153316

Change #1153316 merged by Scott French:

[operations/puppet@production] Revert^3 "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1153316

Mentioned in SAL (#wikimedia-operations) [2025-06-03T17:34:07Z] <swfrench@deploy1003> Started scap sync-world: Scap test run after revert - T389786

Mentioned in SAL (#wikimedia-operations) [2025-06-03T17:36:01Z] <swfrench@deploy1003> Finished scap sync-world: Scap test run after revert - T389786 (duration: 02m 10s)

Alas, as foretold in T389499#10671841, you cannot mutate the spec.template of a k8s Job object, regardless of whether it's suspended or not:

Error: UPGRADE FAILED: release production failed, and has been rolled back due to atomic being set: cannot patch "mediawiki-production-dumps-job-template" with kind Job: Job.batch "mediawiki-production-dumps-job-template" is invalid: spec.template: Invalid value: [... %v noise ...] field is immutable

So, while the features added in T388761: scap needs to be k8s-cluster aware do indeed appear to work as expected, they can't really be applied to mediawiki-dumps-legacy as it's currently envisioned.

@BTullis - So, one alternative that comes to mind is to use a CronJob object as your "template" object instead. Although kind of a hack as well (e.g., there are some fiddly bits around having a long-suspended CronJob) at least the spec.template of their spec.jobTemplate is mutable.

Change #1156820 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] mediawiki: convert the dumps Job into a CronJob

https://gerrit.wikimedia.org/r/1156820

Change #1156830 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] mediawiki-dumps-legacy: allow the airflow service account to query CronJob

https://gerrit.wikimedia.org/r/1156830

Change #1156831 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] mediawiki: convert the dumps Job into a CronJob

https://gerrit.wikimedia.org/r/1156831

Change #1156832 had a related patch set uploaded (by Brouberol; author: Brouberol):

[operations/deployment-charts@master] mediawiki-dumps-legacy: drop the batch.Job.get rbac

https://gerrit.wikimedia.org/r/1156832

Change #1156820 merged by Brouberol:

[operations/deployment-charts@master] mediawiki: define a dumps suspended CronJob

https://gerrit.wikimedia.org/r/1156820

Change #1156830 merged by Brouberol:

[operations/deployment-charts@master] mediawiki-dumps-legacy: allow the airflow service account to query CronJob

https://gerrit.wikimedia.org/r/1156830

Change #1156831 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: convert the dumps Job into a CronJob

https://gerrit.wikimedia.org/r/1156831

Change #1156832 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki-dumps-legacy: drop the batch.Job.get rbac

https://gerrit.wikimedia.org/r/1156832

Change #1159513 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/puppet@production] Revert^4 "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1159513

Change #1159513 merged by Scott French:

[operations/puppet@production] Revert^4 "deployment_server: deploy the mediawiki-dumps-legacy scap target"

https://gerrit.wikimedia.org/r/1159513

Mentioned in SAL (#wikimedia-operations) [2025-06-16T17:11:12Z] <swfrench@deploy1003> Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T389786

Mentioned in SAL (#wikimedia-operations) [2025-06-16T17:12:55Z] <swfrench@deploy1003> Finished scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T389786 (duration: 02m 15s)

After the switch to a CronJob, I was able to successfully apply a lingering image diff from today's UTC-afternoon backport window using scap. Thanks for driving that @brouberol!

So, we're now in a state where mediawiki-dumps-legacy will be updated with each scap deployment :)

Thanks @Scott_French, this has been a long time coming, and it's great to see all that work bearing fruit :)