Page MenuHomePhabricator

Investigate cpu/ram requests and limits for DaemonSets pods
Closed, DeclinedPublic

Description

While adding the new nodes for T244791: Scale up 2020 Kubernetes cluster for final migration of legacy cluster workloads I noticed that an "empty" worker has about 10% of available CPU and 13% of available RAM consumed by the calico, kube-proxy, and cadvisor pods. This feels like a lot of overhead on each worker for an "idle" state.

  • Calico pods are requesting 250m CPU with no explicit RAM request and no explicit limit on CPU or RAM.
  • Kube-proxy pods have no explicit Request or Limit values in the Pod template.
  • Cadvisor pods request 150m CPU and 200Mi RAM with 300m CPU and 2000Mi RAM limits.

Can any of these Request values be tuned downward? Can reasonable Limit values be set for everything?

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
Resolved Bstorm
Resolved Bstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedJprorama
Resolvedaborrero
Resolved Bstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolved Bstorm
Resolved dduvall
Resolved Bstorm
Resolvedaborrero
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolved Bstorm
DeclinedNone
Resolvedaborrero
OpenNone
Resolvedaborrero
StalledNone
Resolvedaborrero
Resolved Bstorm
Resolved Bstorm
Resolvedyuvipanda
DuplicateNone
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
DuplicateNone
Resolved Bstorm
Resolvedaborrero
DuplicateNone
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolved Bstorm
Resolved Bstorm
Resolved Bstorm
DuplicateNone
Resolvedaborrero
OpenNone
Resolved Bstorm
Resolvedbd808
Invalidaborrero
Resolvedbd808
Resolvedbd808
ResolvedSecurity Bstorm
Resolvedaborrero
Resolvedbd808
DuplicateNone
Resolved Bstorm
Resolvedbd808
Resolvedbd808

Event Timeline

Reedy renamed this task from Invesitgate cpu/ram requests and limits for DaemonSets pods to Investigate cpu/ram requests and limits for DaemonSets pods.Feb 14 2020, 6:24 AM

I absolutely do not want to tune down the request values on cluster-critical pods, regarding Calico and kube-proxy. That's one of the design problems in the old cluster is that the kernel will sacrifice workloads that aren't in the cluster (like flannel) to save things that are in the cluster (nearly all user workloads). The ingress controllers are requested at 1GB RAM, for instance and, I think, 1 CPU. Without that they were scheduled terribly and caused early cluster failures. I kind of want to go the other way and reserve some ram for it as well.

All that said, kube-proxy having a lack of limits doesn't seem like a problem to me because the limits will be enforced by basically having the node commit suicide, if they are enforced at all. Kube-proxy is generally pretty light since it is just a firewall manager, but if it starts consuming all the resources, we may as well let it. The node will be hosed either way. It will be hosed harder if calico collapses than kube-proxy, but either way, webservices will stop working there.

Cadvisor on the other hand, we can do things with. That's just there for monitoring, not some critical function that keeps it moving. 150m is an extremely small request, though.

Question: is that CPU and RAM measuring with metrics-server (kubectl top)? If so, that's what these things need. If we restrict it, they just fall into a kill loop.

I absolutely do not want to tune down the request values on cluster-critical pods, regarding Calico and kube-proxy.

Fair. This is why I phrased the task as investigate and not some direct call for remediation action.

Cadvisor on the other hand, we can do things with.

The 2000Mi RAM hard limit was the more concerning number to me for this one. With only 8G of total RAM per node including space for the kernel, 2G for metrics collection feels excessive. I know hard limits and requests are not the same thing, but limits are there to help us tune what gets evicted first when misbehaving to some extent.

Question: is that CPU and RAM measuring with metrics-server (kubectl top)? If so, that's what these things need. If we restrict it, they just fall into a kill loop.

It is measuring with the metrics.k8s.io/v1beta1 API which I think is functionally equivalent to kubectl top. See https://tools.wmflabs.org/k8s-status/nodes/ and its drill-down pages for where I was reading values from.

I absolutely do not want to tune down the request values on cluster-critical pods, regarding Calico and kube-proxy.

Fair. This is why I phrased the task as investigate and not some direct call for remediation action.

Legit. My initial reply was more like "EEP!" And this was the toned down version 😉

The 2000Mi RAM hard limit was the more concerning number to me for this one. With only 8G of total RAM per node including space for the kernel, 2G for metrics collection feels excessive. I know hard limits and requests are not the same thing, but limits are there to help us tune what gets evicted first when misbehaving to some extent.

Yeah, that seems high. I also wonder if we shouldn't be building bigger nodes with smaller disks for Kubernetes in Toolforge. 1G seems like it should be more than enough, no? I wonder what they consume under load? I also am now curious if that's default from upstream or if that's just a value selected when we deployed it. All worth checking.

It is measuring with the metrics.k8s.io/v1beta1 API which I think is functionally equivalent to kubectl top. See https://tools.wmflabs.org/k8s-status/nodes/ and its drill-down pages for where I was reading values from.

Ah, so that unfortunately would represent what these things actually consume at rest. 😬

bd808 triaged this task as Low priority.Feb 25 2020, 5:06 PM
bd808 moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

We decided to decline this task in the backlog grooming meeting.