tools.cluebotng@tools-bastion-15:~$ kubectl get pods NAME READY STATUS RESTARTS AGE bot-6dff8488df-snqfw 0/1 Pending 0 3h22m core-5b4bfd9d88-wtncm 1/1 Running 0 3d8h grafana-alloy-79f4589c5f-gqwh4 1/1 Running 0 3d8h irc-relay-7c7c4fdfd-tdb6t 1/1 Running 0 20h pushgateway-54fc8d676c-ns2tj 1/1 Running 0 3d8h redis-854fc8fb77-bdbfl 1/1 Running 0 3d8h report-interface-85f4f9b766-d2xqh 1/1 Running 0 3d8h report-interface-85f4f9b766-mxkq4 1/1 Running 0 3d8h
tools.cluebotng@tools-bastion-15:~$ kubectl events
LAST SEEN TYPE REASON OBJECT MESSAGE
6m36s (x222 over 108m) Warning FailedScheduling Pod/bot-6dff8488df-snqfw 0/81 nodes are available: 1 node(s) were unschedulable, 2 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) had untolerated taint {toolforge.org/gateway: true}, 74 Insufficient cpu. preemption: 0/81 nodes are available: 7 Preemption is not helpful for scheduling, 74 No preemption victims found for incoming pod.
94s (x529 over 3h21m) Warning FailedScheduling Pod/bot-6dff8488df-snqfw 0/81 nodes are available: 1 node(s) were unschedulable, 3 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) had untolerated taint {toolforge.org/gateway: true}, 74 Insufficient cpu. preemption: 0/81 nodes are available: 7 Preemption is not helpful for scheduling, 74 No preemption victims found for incoming pod.tools.cluebotng@tools-bastion-15:~$ kubectl describe pod bot-6dff8488df-snqfw
Name: bot-6dff8488df-snqfw
Namespace: tool-cluebotng
Priority: 0
Service Account: default
Node: <none>
Labels: app.kubernetes.io/component=deployments
app.kubernetes.io/created-by=cluebotng
app.kubernetes.io/managed-by=toolforge-jobs-framework
app.kubernetes.io/name=bot
app.kubernetes.io/version=2
jobs.toolforge.org/emails=none
pod-template-hash=6dff8488df
toolforge=tool
toolforge.org/mount-storage=none
Annotations: <none>
Status: Pending
SeccompProfile: RuntimeDefault
IP:
IPs: <none>
Controlled By: ReplicaSet/bot-6dff8488df
Containers:
job:
Image: tools-harbor.wmcloud.org/tool-cluebotng/bot:latest@sha256:3bef93cf1957c6b1a8e510db1917112f9e0816d3fbab30c1d425adafd7ed7130
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
--
launcher run-cbng
Limits:
cpu: 3
memory: 2Gi
Requests:
cpu: 3
memory: 1073741824
Liveness: exec [/bin/sh -c health-check] delay=0s timeout=5s period=10s #success=1 #failure=3
Startup: exec [/bin/sh -c health-check] delay=0s timeout=5s period=1s #success=1 #failure=120
Environment:
NO_HOME: a buildservice pod does not need a home env
CBNG_BOT_MYSQL_CREDENTIALS: <set to the key 'CBNG_BOT_MYSQL_CREDENTIALS' in secret 'toolforge.envvar.v1.cbng-bot-mysql-credentials'> Optional: false
CBNG_BOT_PASSWORD: <set to the key 'CBNG_BOT_PASSWORD' in secret 'toolforge.envvar.v1.cbng-bot-password'> Optional: false
CBNG_REPORT_OAUTH_KEY: <set to the key 'CBNG_REPORT_OAUTH_KEY' in secret 'toolforge.envvar.v1.cbng-report-oauth-key'> Optional: false
CBNG_REPORT_OAUTH_SECRET: <set to the key 'CBNG_REPORT_OAUTH_SECRET' in secret 'toolforge.envvar.v1.cbng-report-oauth-secret'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_CHANNELS: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_CHANNELS' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-channels'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_NICK: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_NICK' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-nick'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_PORT: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_PORT' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-port'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_SERVER: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_SERVER' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-server'> Optional: false
IRC_RELAY_SENDER_HUGGLE_RECEIVER: <set to the key 'IRC_RELAY_SENDER_HUGGLE_RECEIVER' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-receiver'> Optional: false
IRC_RELAY_SENDER_HUGGLE_THROTTLER_CONFIG: <set to the key 'IRC_RELAY_SENDER_HUGGLE_THROTTLER_CONFIG' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-throttler-config'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_CHANNELS: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_CHANNELS' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-channels'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_NICK: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_NICK' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-nick'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_PASSWORD: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_PASSWORD' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-password'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_USERNAME: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_USERNAME' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-username'> Optional: false
IRC_RELAY_SENDER_MAIN_THROTTLER_CONFIG: <set to the key 'IRC_RELAY_SENDER_MAIN_THROTTLER_CONFIG' in secret 'toolforge.envvar.v1.irc-relay-sender-main-throttler-config'> Optional: false
REDIS_PASSWORD: <set to the key 'REDIS_PASSWORD' in secret 'toolforge.envvar.v1.redis-password'> Optional: false
TOOL_DEPLOY_TOKEN: <set to the key 'TOOL_DEPLOY_TOKEN' in secret 'toolforge.envvar.v1.tool-deploy-token'> Optional: false
TOOL_REPLICA_PASSWORD: <set to the key 'TOOL_REPLICA_PASSWORD' in secret 'toolforge.envvar.v1.tool-replica-password'> Optional: false
TOOL_REPLICA_USER: <set to the key 'TOOL_REPLICA_USER' in secret 'toolforge.envvar.v1.tool-replica-user'> Optional: false
TOOL_TOOLSDB_PASSWORD: <set to the key 'TOOL_TOOLSDB_PASSWORD' in secret 'toolforge.envvar.v1.tool-toolsdb-password'> Optional: false
TOOL_TOOLSDB_SCHEMA: <set to the key 'TOOL_TOOLSDB_SCHEMA' in secret 'toolforge.envvar.v1.tool-toolsdb-schema'> Optional: false
TOOL_TOOLSDB_USER: <set to the key 'TOOL_TOOLSDB_USER' in secret 'toolforge.envvar.v1.tool-toolsdb-user'> Optional: false
TOOL_TOOLFORGE_API_URL: https://api.svc.tools.eqiad1.wikimedia.cloud:30003
TOOL_REDIS_URI: redis://redis.svc.tools.eqiad1.wikimedia.cloud:6379
TOOL_ELASTICSEARCH_URL: http://elasticsearch.svc.tools.eqiad1.wikimedia.cloud:80
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f42pl (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-f42pl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/created-by=cluebotng,app.kubernetes.io/managed-by=toolforge-jobs-framework,app.kubernetes.io/name=bot,toolforge=tool
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 7m42s (x222 over 110m) default-scheduler 0/81 nodes are available: 1 node(s) were unschedulable, 2 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) had untolerated taint {toolforge.org/gateway: true}, 74 Insufficient cpu. preemption: 0/81 nodes are available: 7 Preemption is not helpful for scheduling, 74 No preemption victims found for incoming pod.
Warning FailedScheduling 2m40s (x529 over 3h22m) default-scheduler 0/81 nodes are available: 1 node(s) were unschedulable, 3 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) had untolerated taint {toolforge.org/gateway: true}, 74 Insufficient cpu. preemption: 0/81 nodes are available: 7 Preemption is not helpful for scheduling, 74 No preemption victims found for incoming pod.Resource config has not changed in > 6 months, last deployment was a day ago (code change).
It appears the resources available within the toolforge cluster are not enough for these long existing jobs, which is causing an outage outside of the maintainer's control.
Flushing all jobs and re-creating does not help the situation.
tools.cluebotng@tools-bastion-15:~$ kubectl get pods NAME READY STATUS RESTARTS AGE bot-5b5d5f7f7b-csdrn 0/1 Pending 0 25s core-6db4ddb6cf-k5smt 1/1 Running 0 23s grafana-alloy-645798f6bf-h9dz2 1/1 Running 0 21s irc-relay-7c7c4fdfd-kkgx2 1/1 Running 0 19s pushgateway-54fc8d676c-tpg7j 1/1 Running 0 16s redis-854fc8fb77-zr2bl 1/1 Running 0 14s report-interface-85f4f9b766-2wzrd 1/1 Running 0 12s report-interface-85f4f9b766-964bg 1/1 Running 0 12s
tools.cluebotng@tools-bastion-15:~$ kubectl describe pod bot-5b5d5f7f7b-csdrn
Name: bot-5b5d5f7f7b-csdrn
Namespace: tool-cluebotng
Priority: 0
Service Account: default
Node: <none>
Labels: app.kubernetes.io/component=deployments
app.kubernetes.io/created-by=cluebotng
app.kubernetes.io/managed-by=toolforge-jobs-framework
app.kubernetes.io/name=bot
app.kubernetes.io/version=2
jobs.toolforge.org/emails=none
pod-template-hash=5b5d5f7f7b
toolforge=tool
toolforge.org/mount-storage=none
Annotations: <none>
Status: Pending
SeccompProfile: RuntimeDefault
IP:
IPs: <none>
Controlled By: ReplicaSet/bot-5b5d5f7f7b
Containers:
job:
Image: tools-harbor.wmcloud.org/tool-cluebotng/bot:latest@sha256:efe6896abbd4ce8b2e060e088857fa7ea247a0a289b526329d48d8f6ee6f7b31
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
--
launcher run-cbng
Limits:
cpu: 3
memory: 2Gi
Requests:
cpu: 3
memory: 1073741824
Liveness: exec [/bin/sh -c health-check] delay=0s timeout=5s period=10s #success=1 #failure=3
Startup: exec [/bin/sh -c health-check] delay=0s timeout=5s period=1s #success=1 #failure=120
Environment:
NO_HOME: a buildservice pod does not need a home env
CBNG_BOT_MYSQL_CREDENTIALS: <set to the key 'CBNG_BOT_MYSQL_CREDENTIALS' in secret 'toolforge.envvar.v1.cbng-bot-mysql-credentials'> Optional: false
CBNG_BOT_PASSWORD: <set to the key 'CBNG_BOT_PASSWORD' in secret 'toolforge.envvar.v1.cbng-bot-password'> Optional: false
CBNG_REPORT_OAUTH_KEY: <set to the key 'CBNG_REPORT_OAUTH_KEY' in secret 'toolforge.envvar.v1.cbng-report-oauth-key'> Optional: false
CBNG_REPORT_OAUTH_SECRET: <set to the key 'CBNG_REPORT_OAUTH_SECRET' in secret 'toolforge.envvar.v1.cbng-report-oauth-secret'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_CHANNELS: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_CHANNELS' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-channels'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_NICK: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_NICK' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-nick'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_PORT: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_PORT' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-port'> Optional: false
IRC_RELAY_SENDER_HUGGLE_CLIENT_SERVER: <set to the key 'IRC_RELAY_SENDER_HUGGLE_CLIENT_SERVER' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-client-server'> Optional: false
IRC_RELAY_SENDER_HUGGLE_RECEIVER: <set to the key 'IRC_RELAY_SENDER_HUGGLE_RECEIVER' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-receiver'> Optional: false
IRC_RELAY_SENDER_HUGGLE_THROTTLER_CONFIG: <set to the key 'IRC_RELAY_SENDER_HUGGLE_THROTTLER_CONFIG' in secret 'toolforge.envvar.v1.irc-relay-sender-huggle-throttler-config'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_CHANNELS: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_CHANNELS' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-channels'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_NICK: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_NICK' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-nick'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_PASSWORD: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_PASSWORD' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-password'> Optional: false
IRC_RELAY_SENDER_MAIN_CLIENT_USERNAME: <set to the key 'IRC_RELAY_SENDER_MAIN_CLIENT_USERNAME' in secret 'toolforge.envvar.v1.irc-relay-sender-main-client-username'> Optional: false
IRC_RELAY_SENDER_MAIN_THROTTLER_CONFIG: <set to the key 'IRC_RELAY_SENDER_MAIN_THROTTLER_CONFIG' in secret 'toolforge.envvar.v1.irc-relay-sender-main-throttler-config'> Optional: false
REDIS_PASSWORD: <set to the key 'REDIS_PASSWORD' in secret 'toolforge.envvar.v1.redis-password'> Optional: false
TOOL_DEPLOY_TOKEN: <set to the key 'TOOL_DEPLOY_TOKEN' in secret 'toolforge.envvar.v1.tool-deploy-token'> Optional: false
TOOL_REPLICA_PASSWORD: <set to the key 'TOOL_REPLICA_PASSWORD' in secret 'toolforge.envvar.v1.tool-replica-password'> Optional: false
TOOL_REPLICA_USER: <set to the key 'TOOL_REPLICA_USER' in secret 'toolforge.envvar.v1.tool-replica-user'> Optional: false
TOOL_TOOLSDB_PASSWORD: <set to the key 'TOOL_TOOLSDB_PASSWORD' in secret 'toolforge.envvar.v1.tool-toolsdb-password'> Optional: false
TOOL_TOOLSDB_SCHEMA: <set to the key 'TOOL_TOOLSDB_SCHEMA' in secret 'toolforge.envvar.v1.tool-toolsdb-schema'> Optional: false
TOOL_TOOLSDB_USER: <set to the key 'TOOL_TOOLSDB_USER' in secret 'toolforge.envvar.v1.tool-toolsdb-user'> Optional: false
TOOL_TOOLFORGE_API_URL: https://api.svc.tools.eqiad1.wikimedia.cloud:30003
TOOL_REDIS_URI: redis://redis.svc.tools.eqiad1.wikimedia.cloud:6379
TOOL_ELASTICSEARCH_URL: http://elasticsearch.svc.tools.eqiad1.wikimedia.cloud:80
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-798nh (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
kube-api-access-798nh:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: kubernetes.io/hostname:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/created-by=cluebotng,app.kubernetes.io/managed-by=toolforge-jobs-framework,app.kubernetes.io/name=bot,toolforge=tool
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18s default-scheduler 0/81 nodes are available: 1 node(s) were unschedulable, 2 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) had untolerated taint {toolforge.org/gateway: true}, 74 Insufficient cpu. preemption: 0/81 nodes are available: 7 Preemption is not helpful for scheduling, 74 No preemption victims found for incoming pod.
Warning FailedScheduling 8s (x5 over 44s) default-scheduler 0/81 nodes are available: 1 node(s) were unschedulable, 3 Insufficient memory, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 3 node(s) had untolerated taint {toolforge.org/gateway: true}, 74 Insufficient cpu. preemption: 0/81 nodes are available: 7 Preemption is not helpful for scheduling, 74 No preemption victims found for incoming pod.