Page MenuHomePhabricator

The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']])
Closed, ResolvedPublic

Description

I'm currently seeing some intermittent tool failures.

image.png (300×1 px, 27 KB)

On running kubectl apply -f worker-deployment.yml to recreate a deployment, I get:

tools.refill-api@tools-sgebastion-10:~$ kubectl apply -f worker-deployment.yml
Error from server: error when applying patch:
{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"celery"}],"containers":[{"name":"celery","resources":{"limits":{"cpu":"1.0"}}}]}}}}
to:
Resource: "apps/v1, Resource=deployments", GroupVersionKind: "apps/v1, Kind=Deployment"
Name: "refill-api-worker", Namespace: "tool-refill-api"
for: "worker-deployment.yml": admission webhook "registry-admission.tools.wmflabs.org" denied the request: The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']]): [Kind=apps/v1, Kind=Deployment, Namespace=tool-refill-api Name=refill-api-worker Image=docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest]
Error from server: error when applying patch:
{"spec":{"template":{"spec":{"$setElementOrder/containers":[{"name":"celery"}],"containers":[{"name":"celery","resources":{"limits":{"cpu":"0.5"}}}]}}}}
to:
Resource: "apps/v1, Resource=deployments", GroupVersionKind: "apps/v1, Kind=Deployment"
Name: "refill-api-scheduler", Namespace: "tool-refill-api"
for: "worker-deployment.yml": admission webhook "registry-admission.tools.wmflabs.org" denied the request: The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']]): [Kind=apps/v1, Kind=Deployment, Namespace=tool-refill-api Name=refill-api-scheduler Image=docker-registry.tools.wmflabs.org/toolforge-python37-sssd-base:latest]

Details

Other Assignee
aborrero

Event Timeline

Also affecting stewardbots, unable to restart the bots after they disconnected (Remote host closed the connection).

TheresNoTime triaged this task as Unbreak Now! priority.Oct 10 2022, 8:06 PM

UBN!, something is fairly broken I feel

signatures, which uses a standard webservice configuration, is having the same issue and is 503ing now. The running pod was deleted, and new pods can't be started. Stopping and starting the webservice had no effect.

tools.signatures@tools-sgebastion-10:~$ kubectl get pods
NAME                                      READY   STATUS      RESTARTS   AGE
signatures.sigprobs-cron-27730335-l2x9x   0/1     Completed   0          18d
signatures.sigprobs-cron-27750495-2jczm   0/1     Completed   0          4d15h
tools.signatures@tools-sgebastion-10:~$ kubectl get all
NAME                                          READY   STATUS      RESTARTS   AGE
pod/signatures.sigprobs-cron-27730335-l2x9x   0/1     Completed   0          18d
pod/signatures.sigprobs-cron-27750495-2jczm   0/1     Completed   0          4d15h

NAME                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/signatures   ClusterIP   10.111.185.213   <none>        8000/TCP   343d

NAME                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/signatures   0/1     0            0           343d

NAME                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/signatures-5fcddcc4ff   1         0         0       343d

NAME                                     SCHEDULE     SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cronjob.batch/signatures.sigprobs-cron   15 4 * * 4   False     0        4d15h           2y36d

NAME                                          COMPLETIONS   DURATION   AGE
job.batch/signatures.sigprobs-cron-27730335   1/1           22m        18d
job.batch/signatures.sigprobs-cron-27740415   1/1           22m        11d
job.batch/signatures.sigprobs-cron-27750495   1/1           22m        4d15h
tools.signatures@tools-sgebastion-10:~$ kubectl describe deployment.apps/signatures
Name:                   signatures
Namespace:              tool-signatures
CreationTimestamp:      Sun, 31 Oct 2021 22:53:51 +0000
Labels:                 app.kubernetes.io/component=web
                        app.kubernetes.io/managed-by=webservice
                        name=signatures
                        toolforge=tool
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app.kubernetes.io/component=web,app.kubernetes.io/managed-by=webservice,name=signatures,toolforge=tool
Replicas:               1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app.kubernetes.io/component=web
           app.kubernetes.io/managed-by=webservice
           name=signatures
           toolforge=tool
  Containers:
   webservice:
    Image:      docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest
    Port:       8000/TCP
    Host Port:  0/TCP
    Command:
      /usr/bin/webservice-runner
      --type
      uwsgi-python
      --port
      8000
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type             Status  Reason
  ----             ------  ------
  Progressing      True    NewReplicaSetAvailable
  Available        False   MinimumReplicasUnavailable
  ReplicaFailure   True    FailedCreate
OldReplicaSets:    <none>
NewReplicaSet:     signatures-5fcddcc4ff (0/1 replicas created)
Events:            <none>
tools.signatures@tools-sgebastion-10:~$ kubectl describe replicaset.apps/signatures-5fcddcc4ff
Name:           signatures-5fcddcc4ff
Namespace:      tool-signatures
Selector:       app.kubernetes.io/component=web,app.kubernetes.io/managed-by=webservice,name=signatures,pod-template-hash=5fcddcc4ff,toolforge=tool
Labels:         app.kubernetes.io/component=web
                app.kubernetes.io/managed-by=webservice
                name=signatures
                pod-template-hash=5fcddcc4ff
                toolforge=tool
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/signatures
Replicas:       0 current / 1 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app.kubernetes.io/component=web
           app.kubernetes.io/managed-by=webservice
           name=signatures
           pod-template-hash=5fcddcc4ff
           toolforge=tool
  Containers:
   webservice:
    Image:      docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest
    Port:       8000/TCP
    Host Port:  0/TCP
    Command:
      /usr/bin/webservice-runner
      --type
      uwsgi-python
      --port
      8000
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age   From                   Message
  ----     ------        ----  ----                   -------
  Warning  FailedCreate  80s   replicaset-controller  Error creating: admission webhook "registry-admission.tools.wmflabs.org" denied the request: The following container images did not match any of the allowed registries ([['docker-registry.tools.wmflabs.org']]): [Kind=/v1, Kind=Pod, Namespace=tool-signatures Name=signatures-5fcddcc4ff-gw9nx Image=docker-registry.tools.wmflabs.org/toolforge-python39-sssd-web:latest]
tools.signatures@tools-sgebastion-10:~$ webservice stop && webservice start
Stopping webservice
Traceback (most recent call last):
  File "/usr/local/bin/webservice", line 460, in <module>
    start(job, "Starting webservice")
  File "/usr/local/bin/webservice", line 83, in start
    job.request_start()
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 574, in request_start
    self.api.create_object("deployments", self._get_deployment())
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 791, in create_object
    version=K8sClient.VERSIONS[kind],
  File "/usr/lib/python3/dist-packages/toolsws/backends/kubernetes.py", line 752, in _post
    r.raise_for_status()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://k8s.tools.eqiad1.wikimedia.cloud:6443/apis/apps/v1/namespaces/tool-signatures/deployments
taavi lowered the priority of this task from Unbreak Now! to High.Oct 10 2022, 8:33 PM

I /think/ this is under control for now - will follow up here when it's not so late with the root cause and follow-up items to prevent this in the future. Sorry all!

Things are back up but still very unstable.

Things are back up but still very unstable.

Well, after I said that things proceeded to remain stable for a day and are still stable as of right now.

This is already solved, and there's a few followups to avoid it from happening again, will close :)