Page MenuHomePhabricator

toolforge-jobs should properly process 'out of quota' errors
Closed, ResolvedPublicBUG REPORT

Description

Error message:

[toolforge-jobs] ERROR: unable to create job: "HTTP 403: likely an internal bug: 403 Client Error: Forbidden for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-giftbot/deployments. k8s JSON: {\"kind\": \"Deployment\", \"apiVersion\": \"apps/v1\", \"metadata\": {\"name\": \"vm\", \"namespace\": \"tool-giftbot\", \"labels\": {\"toolforge\": \"tool\", \"app.kubernetes.io/version\": \"1\", \"app.kubernetes.io/managed-by\": \"toolforge-jobs-framework\", \"app.kubernetes.io/created-by\": \"giftbot\", \"app.kubernetes.io/component\": \"deployments\", \"app.kubernetes.io/name\": \"vm\", \"jobs.toolforge.org/filelog\": \"yes\", \"jobs.toolforge.org/emails\": \"none\"}}, \"spec\": {\"template\": {\"metadata\": {\"labels\": {\"toolforge\": \"tool\", \"app.kubernetes.io/version\": \"1\", \"app.kubernetes.io/managed-by\": \"toolforge-jobs-framework\", \"app.kubernetes.io/created-by\": \"giftbot\", \"app.kubernetes.io/component\": \"deployments\", \"app.kubernetes.io/name\": \"vm\", \"jobs.toolforge.org/filelog\": \"yes\", \"jobs.toolforge.org/emails\": \"none\"}}, \"spec\": {\"restartPolicy\": \"Always\", \"containers\": [{\"name\": \"vm\", \"image\": \"docker-registry.tools.wmflabs.org/toolforge-tcl86-sssd-base:latest\", \"workingDir\": \"/data/project/giftbot\", \"command\": [\"/bin/sh\", \"-c\", \"--\", \"./merge-stderr ./vm.tcl 1>>vm.out 2>>vm.err\"], \"resources\": {}, \"env\": [{\"name\": \"HOME\", \"value\": \"/data/project/giftbot\"}], \"volumeMounts\": [{\"mountPath\": \"/data/project\", \"name\": \"home\"}]}], \"volumes\": [{\"name\": \"home\", \"hostPath\": {\"path\": \"/data/project\", \"type\": \"Directory\"}}]}}, \"replicas\": 1, \"selector\": {\"matchLabels\": {\"toolforge\": \"tool\", \"app.kubernetes.io/version\": \"1\", \"app.kubernetes.io/managed-by\": \"toolforge-jobs-framework\", \"app.kubernetes.io/created-by\": \"giftbot\", \"app.kubernetes.io/component\": \"deployments\", \"app.kubernetes.io/name\": \"vm\", \"jobs.toolforge.org/filelog\": \"yes\", \"jobs.toolforge.org/emails\": \"none\"}}}}"
k8s JSON
{
  "kind": "Deployment",
  "apiVersion": "apps/v1",
  "metadata": {
    "name": "vm",
    "namespace": "tool-giftbot",
    "labels": {
      "toolforge": "tool",
      "app.kubernetes.io/version": "1",
      "app.kubernetes.io/managed-by": "toolforge-jobs-framework",
      "app.kubernetes.io/created-by": "giftbot",
      "app.kubernetes.io/component": "deployments",
      "app.kubernetes.io/name": "vm",
      "jobs.toolforge.org/filelog": "yes",
      "jobs.toolforge.org/emails": "none"
    }
  },
  "spec": {
    "template": {
      "metadata": {
        "labels": {
          "toolforge": "tool",
          "app.kubernetes.io/version": "1",
          "app.kubernetes.io/managed-by": "toolforge-jobs-framework",
          "app.kubernetes.io/created-by": "giftbot",
          "app.kubernetes.io/component": "deployments",
          "app.kubernetes.io/name": "vm",
          "jobs.toolforge.org/filelog": "yes",
          "jobs.toolforge.org/emails": "none"
        }
      },
      "spec": {
        "restartPolicy": "Always",
        "containers": [
          {
            "name": "vm",
            "image": "docker-registry.tools.wmflabs.org/toolforge-tcl86-sssd-base:latest",
            "workingDir": "/data/project/giftbot",
            "command": [
              "/bin/sh",
              "-c",
              "--",
              "./merge-stderr ./vm.tcl 1>>vm.out 2>>vm.err"
            ],
            "resources": {},
            "env": [
              {
                "name": "HOME",
                "value": "/data/project/giftbot"
              }
            ],
            "volumeMounts": [
              {
                "mountPath": "/data/project",
                "name": "home"
              }
            ]
          }
        ],
        "volumes": [
          {
            "name": "home",
            "hostPath": {
              "path": "/data/project",
              "type": "Directory"
            }
          }
        ]
      }
    },
    "replicas": 1,
    "selector": {
      "matchLabels": {
        "toolforge": "tool",
        "app.kubernetes.io/version": "1",
        "app.kubernetes.io/managed-by": "toolforge-jobs-framework",
        "app.kubernetes.io/created-by": "giftbot",
        "app.kubernetes.io/component": "deployments",
        "app.kubernetes.io/name": "vm",
        "jobs.toolforge.org/filelog": "yes",
        "jobs.toolforge.org/emails": "none"
      }
    }
  }
}

Event Timeline

Could this be because I already have 3 deployments (1 k8s webservice and 2 continuous jobs) and adding another continuous job would exceed my quota?

tools.giftbot@tools-sgebastion-11:~$ kubectl get deployments
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
giftbot   1/1     1            1           714d
gva       1/1     1            1           67m
gvm       1/1     1            1           66m

I confirm you are out of quota for more deployments.

Looking at your jobs, it is clear that you may need a quota bump. Also, we need better error reporting for this particular failure.

taavi renamed this task from Unable to start job to toolforge-jobs should properly process 'out of quota' errors.Mar 30 2022, 7:51 AM
bd808 changed the subtype of this task from "Task" to "Bug Report".Apr 6 2022, 10:00 PM

Change 852142 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[cloud/toolforge/jobs-framework-api@main] run: report out of quota errors

https://gerrit.wikimedia.org/r/852142

Change 852142 merged by Arturo Borrero Gonzalez:

[cloud/toolforge/jobs-framework-api@main] run: report out of quota errors

https://gerrit.wikimedia.org/r/852142

Mentioned in SAL (#wikimedia-cloud-feed) [2022-11-04T12:12:53Z] <wm-bot2> build & push docker image docker-registry.tools.wmflabs.org/toolforge-jobs-framework-api:2b800f5 from https://gerrit.wikimedia.org/r/cloud/toolforge/jobs-framework-api (2b800f5) (T304900) - cookbook ran by arturo@nostromo

aborrero claimed this task.