Page MenuHomePhabricator

[jobs-api,jobs-cli] restarting a continuous jobs causes for some seconds two jobs are running side by side
Open, In Progress, LowPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • restart a continuous running command by the following command
  • toolforge jobs restart <jobname>

What happens?:
For a short period of time (~15-30 seconds) the "old" jobs still runs in parallel with the new restarted job

What should have happened instead?:
There shall only run one job with the same name at a time

I used this small script

#!/bin/bash
outfile=/data/project/persondata/logs/testout.out
echo "Process started" >>$outfile
while `true`
do
  date >>$outfile
  uname -a >>$outfile
  sleep 2
done

Then on the commandline I started a job like
toolforge jobs run --command '~/playground/continuous.sh' --image php8.2 -o --continuous test
half a minute later I restarted the job:
toolforge jobs restart test
and another 30 later deleted it

The following could be seen in the file /data/project/persondata/logs/testout.out
Please not: After the line "Process started" in the middle of the file, you see two hostnames: Linux test-bf66c58b8-7f8br and Linux test-bf66c58b8-xsjl9

Process started
Mon Sep 23 08:26:34 AM UTC 2024
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:26:36 AM UTC 2024
…
Mon Sep 23 08:26:59 AM UTC 2024
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:01 AM UTC 2024
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Process started
Mon Sep 23 08:27:01 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:03 AM UTC 2024
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:03 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:05 AM UTC 2024
Mon Sep 23 08:27:05 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:07 AM UTC 2024
Mon Sep 23 08:27:07 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:09 AM UTC 2024
Mon Sep 23 08:27:09 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:11 AM UTC 2024
Mon Sep 23 08:27:11 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:13 AM UTC 2024
Mon Sep 23 08:27:13 AM UTC 2024
Linux test-bf66c58b8-xsjl9 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:15 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:17 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
Mon Sep 23 08:27:19 AM UTC 2024
Linux test-bf66c58b8-7f8br 6.1.0-25-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux
…

Event Timeline

Kubernetes creates the replacement pod as soon as the first pod enters termination state, without waiting for the first pod to actually disappear.
This is the native (and default) behavior of Kubernetes deployments, see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

Introducing a different behavior could be as simple as this patch for toolforge jobs-api:

diff --git a/tjf/runtimes/k8s/jobs.py b/tjf/runtimes/k8s/jobs.py
index fd3e85c..272924d 100644
--- a/tjf/runtimes/k8s/jobs.py
+++ b/tjf/runtimes/k8s/jobs.py
@@ -290,6 +290,12 @@ def _get_k8s_deployment_object(job: Job) -> K8S_OBJECT_TYPE:
             "labels": labels,
         },
         "spec": {
+            # strategy: recreate prevents extra pods from running in parallel when restarting the job
+            # see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
+            "strategy" : {
+                "type": "Recreate",
+            },
             "template": _get_k8s_podtemplate(job=job, restart_policy="Always", probes=probes),
             "replicas": replicas,
             "selector": {

However, there are certain use cases in which it is desirable that a given pod for a deployment is always running, specially those that are behind some kind of service load balancer. Not doing so could introduce downtime for services.
So, a patch like the above may introduce a regression from the point of view of some users.

Another approaches:

  • we could try to infer the right value for spec.strategy at job creation time. If a continuous job has more than one replica, then it should be fine for it to have spec.strategy: RollingUpdate, if the job has one replica, then always use spec.Strategy: Recreate
  • we could "leak" this config setting into the toolforge-jobs command line, something like toolforge jobs restart <name> --recreate, modify in the fly the spec.strategy of the Deployment, set to Recreate before killing the running pod of the job, then revert the change

I think I like the first option more. I would like to avoid having additional options in the command line, otherwise we will end up just exposing the whole k8s API via config switches, and our toolforge abstraction would loose its purpose.

aborrero changed the task status from Open to In Progress.Sep 23 2024, 9:18 AM
aborrero triaged this task as Low priority.
aborrero added a project: User-aborrero.
aborrero moved this task from Backlog to Radar/observer on the User-aborrero board.

I think this is the patch I'm proposing:

diff --git a/tjf/runtimes/k8s/jobs.py b/tjf/runtimes/k8s/jobs.py
index fd3e85c..22eff5d 100644
--- a/tjf/runtimes/k8s/jobs.py
+++ b/tjf/runtimes/k8s/jobs.py
@@ -281,6 +281,10 @@ def _get_k8s_deployment_object(job: Job) -> K8S_OBJECT_TYPE:
     probes = get_healthcheck_for_k8s(job.health_check) if job.health_check else {}
     replicas = job.replicas if job.replicas else 1
 
+    # see https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#recreate-deployment
+    # see T375366
+    strategy = "Recreate" if replicas == 1 else "RollingUpdate"
+
     obj = {
         "apiVersion": K8sJobKind.DEPLOYMENT.api_version,
         "kind": K8sJobKind.DEPLOYMENT.value,
@@ -290,6 +294,9 @@ def _get_k8s_deployment_object(job: Job) -> K8S_OBJECT_TYPE:
             "labels": labels,
         },
         "spec": {
+            "strategy" : {
+                "type": strategy,
+            },
             "template": _get_k8s_podtemplate(job=job, restart_policy="Always", probes=probes),
             "replicas": replicas,
             "selector": {
dcaro renamed this task from restarting a continuous jobs causes for some seconds two jobs are running side by side to [jobs-api,jobs-cli] restarting a continuous jobs causes for some seconds two jobs are running side by side.Nov 7 2024, 5:35 PM