Page MenuHomePhabricator

k8s staging seems to be out of IP addresses
Closed, ResolvedPublic

Description

Today, I tried to deploy eventgate-analytics to k8s staging for T383814. The deploy failed, and I saw this error in kubectl get events

FailedCreatePodSandBox   pod/eventgate-production-6f5b4bbb64-rhrf4    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9ac30148968c86a8bdef944249861f2951a497cec29651e15ed3ad2e1f2b9b33": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled

Is staging out of IP addresses?

Event Timeline

We've run into the same issue while deploying (edit: Wikifunctions) today:

$ kubectl get events
LAST SEEN   TYPE      REASON                   OBJECT                                                          MESSAGE
36m         Warning   FailedMount              pod/function-evaluator-javascript-evaluator-5b45585-2gn47       MountVolume.SetUp failed for volume "envoy-config-volume" : failed to sync configmap cache: timed out waiting for the condition
36m         Warning   FailedMount              pod/function-evaluator-javascript-evaluator-5b45585-2gn47       MountVolume.SetUp failed for volume "tls-certs-volume" : failed to sync secret cache: timed out waiting for the condition
36m         Normal    TaintManagerEviction     pod/function-evaluator-javascript-evaluator-5b45585-2gn47       Cancelling deletion of Pod wikifunctions/function-evaluator-javascript-evaluator-5b45585-2gn47
46m         Warning   FailedMount              pod/function-orchestrator-main-orchestrator-6c8dcbd7bb-zwb9d    MountVolume.SetUp failed for volume "envoy-config-volume" : failed to sync configmap cache: timed out waiting for the condition
46m         Warning   FailedMount              pod/function-orchestrator-main-orchestrator-6c8dcbd7bb-zwb9d    MountVolume.SetUp failed for volume "tls-certs-volume" : failed to sync secret cache: timed out waiting for the condition
46m         Normal    TaintManagerEviction     pod/function-orchestrator-main-orchestrator-6c8dcbd7bb-zwb9d    Cancelling deletion of Pod wikifunctions/function-orchestrator-main-orchestrator-6c8dcbd7bb-zwb9d
23m         Normal    Scheduled                pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Successfully assigned wikifunctions/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr to kubestage1005.eqiad.wmnet
23m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "402d3de0a2141326d733d8623d319a28eae80f0177c44d4a5fc52da09465789d": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
23m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "83c6652d1d2ca4fd64a44be559f6cc1d4ae21ca9b0c8541f0d2ee1548cfcc2c5": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
23m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7318276c835e7d04a7bb46fa2a495a9d4890357946594dc7d0eafc411f8561a9": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
23m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e6b0cbe990529e3f31312f03cc2207fb61e17d91b0b319f23dc0e10ca4ab7483": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
22m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "af56a0b899d3d1661dc6c3f618ebea2802c93b23d1dc120b375fd9b25feaf39c": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
22m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ba1d448a1ed8606bda1b5b9488d571132399aba8f962753435451c5ea27dde30": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
22m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c3148bda3a9effc9481c2e81b4cbc61a4994b65198c9837b5a74c6dcfa3d0a24": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
22m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "195dd0e0539e20f76f35dda1f98aaea28c337276c3e1f4ffd60c047aa2cdd6e5": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
22m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3caeaf1f2b714dd2fc23ad5ef46261b8d4ce605470af31ac994c3c25a4534beb": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
13m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-csklr    (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "335cb0c2f11671dff23415da93545256a969d640b64719696a04b53542017518": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
34m         Normal    Scheduled                pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Successfully assigned wikifunctions/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq to kubestage1005.eqiad.wmnet
34m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f8d96e9a22f02f0b46d7dc08fcd516f6ae8e05d1cf56ff8a5739023919738599": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
34m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4b2701aa3e04f6eed20200b0646293940313ef8c760070e43d0be6ba041bb0d4": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
34m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9a1a5ed317e7097a6ad2cc96d3477c5341fef6dc03a1f7b3d23dfc3eded5c9a9": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
34m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "17dabfe46d20a190fa55f6f9f36f1bd7f39aad32938a2edd2d122ea58bfa1397": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
33m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "542e219230d00a1844c74759c47c7140c1098ce82e4aaf0030d19177d34e1fda": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
33m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7c5a8e10dd283cea86d521e873997bc3aa21cb474d9860098b8097bffc243de9": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
33m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4b3e4156a05f956e66c97927dbba6c4c3ff0984836a3feae78d8efc4c2e6dce0": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
33m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9ef8a2b976deea6c87a01a235f467328d5d1fe065c3b9975da0a5bd3eecd1272": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
32m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d7ea78e291582e3245aed8e7b9402fcf7b4cbe8bdb56d4dd03d3bb394e90c6e6": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
24m         Warning   FailedCreatePodSandBox   pod/function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq    (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d0b4c60228eaeb6c20217940edc84a1582af4c9bad0307dc3dd0091be272a500": plugin type="calico" failed (add): failed to request IPv4 addresses: Assigned 0 out of 1 requested IPv4 addresses; No more free affine blocks and strict affinity enabled
34m         Normal    SuccessfulCreate         replicaset/function-orchestrator-main-orchestrator-6db9b6bbb6   Created pod: function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq
24m         Normal    SuccessfulDelete         replicaset/function-orchestrator-main-orchestrator-6db9b6bbb6   Deleted pod: function-orchestrator-main-orchestrator-6db9b6bbb6-f5jbq
23m         Normal    SuccessfulCreate         replicaset/function-orchestrator-main-orchestrator-6db9b6bbb6   Created pod: function-orchestrator-main-orchestrator-6db9b6bbb6-csklr
13m         Normal    SuccessfulDelete         replicaset/function-orchestrator-main-orchestrator-6db9b6bbb6   Deleted pod: function-orchestrator-main-orchestrator-6db9b6bbb6-csklr
23m         Normal    ScalingReplicaSet        deployment/function-orchestrator-main-orchestrator              Scaled up replica set function-orchestrator-main-orchestrator-6db9b6bbb6 to 1
13m         Normal    ScalingReplicaSet        deployment/function-orchestrator-main-orchestrator              Scaled down replica set function-orchestrator-main-orchestrator-6db9b6bbb6 to 0

No more free affine blocks and strict affinity enabled

Are we (and data systems) set with affinity to some particular set of hosts, whereas others (who've had no issues) aren't?

Mentioned in SAL (#wikimedia-operations) [2025-02-12T16:09:22Z] <claime> Deleting benthos, changeprop, changeprop-jobqueue from staging to free pod ip blocks - T386107

Mentioned in SAL (#wikimedia-operations) [2025-02-12T16:15:12Z] <claime> Halving mw-api-int staging replicas to free pod ip blocks - T386107

Clement_Goubert claimed this task.
Clement_Goubert subscribed.

Problem solved temporarily by removing some workloads from staging-eqiad. Creating subtasks for longer term action items.

There where two alerts still firing for which I guess this (plus an SSH disconnect?) was the root cause:

  • Helm release eventgate-analytics/canary on k8s-staging@eqiad in state pending-upgrade
  • Helm release eventgate-analytics/production on k8s-staging@eqiad in state pending-upgrade

I've rolled back both releases to the last good state:

# helm -n eventgate-analytics history canary
45              Wed Feb 12 15:17:41 2025        pending-upgrade eventgate-0.16.0        v1.10.0         Preparing upgrade                                           
46              Mon Feb 17 09:21:14 2025        deployed        eventgate-0.15.0        v1.6.3          Rollback to 44 
# helm -n eventgate-analytics history production
33              Wed Feb 12 15:17:41 2025        pending-upgrade eventgate-0.16.0        v1.10.0         Preparing upgrade                                               
34              Mon Feb 17 09:21:29 2025        deployed        eventgate-0.15.0        v1.6.3          Rollback to 32