Page MenuHomePhabricator

Special consideration needed for toolforge-jobs when performing kubernetes cluster upgrades?
Closed, InvalidPublic

Description

Previously most workloads on the Kubernetes cluster have been web services and other continuous jobs where a restart and a move to another node would not have mattered. This assumption changes when the jobs framework introduces cron jobs. This task is to:

  • check if running jobs will not misbehave when they are restarted
    • TODO: should jobs reschedule or not if they don't complete? especially relevant for one-off jobs
  • consider adding some delay to let running jobs complete when nodes are being drained

Event Timeline

aborrero triaged this task as Medium priority.Oct 11 2021, 11:45 AM
bd808 renamed this task from toolforge-jobs and kubernetes cluster upgrades to Special consideration needed for toolforge-jobs when performing kubernetes cluster upgrades?.May 31 2022, 8:12 PM
bd808 edited projects, added Toolforge Jobs framework; removed Toolforge.
dcaro subscribed.

We have been running jobs already on k8s for a while without issues, I think this does not apply anymore.

Please reopen if it's not the case.