Page MenuHomePhabricator

Update DC switchover cookbooks to handle maintenance scripts on k8s
Open, Needs TriagePublic

Description

As part of the switchover, we stop and then restart maintenance scripts, directly on the maintenance hosts. With maintenance scripts moving to Kubernetes, we'll need to update the cookbooks.

Before the March 2024 switch:

  • Update 01-stop-maintenance.py to delete all Jobs running in the "mw-script" namespace in from_dc.

That's all we need now, because there are no periodic maintenance scripts yet; only the ones started manually. We should make sure none of those are still running when the read-only phase starts, but we don't need to make any changes to periodic jobs (which are all still on mwmaint), or restart anything in to_dc.

Before the September 2024 switch:

  • Update 01-stop-maintenance.py to wait for Jobs to terminate after the delete API call.
  • Update 01-stop-maintenance.py to disable cronjobs in from_dc. (Pre-k8s, we just kill the actively-running processes and then hustle to start the next step before the timers restart anything. That works fine, but as long as we're redesigning this, we can freeze the crons too.)
  • Update 08-start-maintenance.py to enable cronjobs in to_dc.
  • After we add a way for maintenance scripts to be marked as idempotent, update 08-start-maintenance.py to restart in to_dc any idempotent script that was killed in from_dc.

Event Timeline

Change 1008582 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/software/spicerack@master] k8s: Add getter for the Batch API

https://gerrit.wikimedia.org/r/1008582

Change 1008583 had a related patch set uploaded (by RLazarus; author: RLazarus):

[operations/cookbooks@master] sre.switchdc.mediawiki: Stop maintenance scripts on Kubernetes

https://gerrit.wikimedia.org/r/1008583

Change 1008582 merged by jenkins-bot:

[operations/software/spicerack@master] k8s: Add getter for the Batch API

https://gerrit.wikimedia.org/r/1008582

Change 1008583 merged by jenkins-bot:

[operations/cookbooks@master] sre.switchdc.mediawiki: Stop maintenance scripts on Kubernetes

https://gerrit.wikimedia.org/r/1008583

This is good to go for the March 2024 switchover, so removing it as a subtask.