Objective
We currently have minimal monitoring of our production kubernetes clusters. Starting with 1.8 the /metrics API exists which expose CPU and memory usage. There also exists a nice aggregator starting from 1.7 called metrics server [1] that can be used as well (or in tandem). And there also exists heapster [2], an aggregator for monitoring and event data. We should investigate these solutions (and others perhaps), pick one (or more), implement and obtain graphs for our kubernetes clusters
[1] https://github.com/kubernetes-incubator/metrics-server
[2] https://github.com/kubernetes/heapster
Preamble
metrics collecting/exposing is in a state of flux in kubernetes and already has some history to deal with. Here's a description of various components/notions in no specific
cAdvisor
Project page is https://github.com/google/cadvisor
So this is a nice little go binary that runs alongside your containers on your host (as a container or a standalone daemon), usually as root (but not strictly required, metrics will just don't appear if no root privileges) and starts looking at cgroups and getting data out of them. Of course it can run as a docker container and in that case can actually query the docker daemon (it is meant to bind-mount /var/lib/docker) about stuff and expose them. It does support all container engines supposedly and bugs reports should be opened for any non supported one.
It exposes an HTML based interface [1] and a REST API interface [2] . It supports a variety of sinks to send data to [3] or allows them to consume from it (e.g. prometheus)
cAdvisor, being a simple go project was imported into the kubernetes project and was builtin into kubelet (1 of the 2 kubernetes daemons running on every node). So every kubelet listens on port 4194 (unless disabled) and exposes the /containers endpoint (an HTML web page) and /metrics (a prometheus compatible endpoint).
We have absolutely no reason to care about cAdvisor itself as we get pretty much the full functionality of it via the kubelet. We could read the docs and implement something to talk it it's API, but IMHO this does not make much sense.
The API server (the one running on the master) exposes that information via a proxying model
Starting with kubernetes 1.7.3 (we are running 1.7.4) the information from the /metrics endpoint is transparently split into 2 endpoints. Those are:
* api/v1/nodes/<node_name>/proxy/metrics * api/v1/nodes/<node_name>/proxy/metrics/cadvisor
For 1.7.0-1.7.2 (we don't really care, but adding it for completeness sake) only api/v1/nodes/<node_name>/proxy/metrics was exposed.
For 1.6 and earlier (production doesn't care but labs may do), both endpoints did exist but the former duplicated part of the information from the latter.
Note: The lack of trailing slashes is unfortunately important, specifying one means a 404 will be returned instead
We should have a prometheus configuration that scrapes those 2 endpoints per node, which means we will need some discovery mechanism. Prometheus does look like it has one [4]
API server and controller metrics
Since kubernetes 1.0 (at the least) the /metrics endpoint on the API server would expose metrics about itself and the controller manager daemons. It's in prometheus format and has been stable for quite a while.
We should have a prometheus configuration that scrapes this endpoint. It's just one endpoint and should be easy to do
This is not to be confused with kube-state-metrics (https://github.com/kubernetes/kube-state-metrics) or the /metrics endpoint of kubelet
Heapster
Project page at https://github.com/kubernetes/heapster
Heapster is a effectively a collector. It runs in the cluster (or as a standalone daemon outside it), polls multiple sources (practically always just 1, the kubernetes API server) and then sends the data it gathered from it to a sink (and kubelets cAdvisor API) as well as exposing it it via a REST API [5] albeit for a limited period of time as the data is in memory. Many different sinks type are supported [6]. The example setup seems to be influxdb with a grafana frontend.
The data exposed by heapster is well structured [7]
Overall heapster, serves 2 functions. One is that it is a translator/collector from cAdvisor to one of the sink types. Second it's an grouping (is that a good term?) REST API in order to avoid having to talk to the REST cAdvisor API of every kubelet. The API it exposes is/can be used by the Horizontal Pod Autoscaler and the scheduler
metrics-server
Project page at https://github.com/kubernetes-incubator/metrics-server
The metrics-server is effectively an effort to standardize the second part of heapster's functionality, namely the grouped/aggregated REST API. It effectively is an API exposing an in-memory datastore of the grouped REST cAdvisor API of every kubelet. It is still in beta, still in incubator in fact, and under development. The API it exposes is meant to be used by the Horizontal Pod Autoscaler and scheduler in the future and is going to be built-in and running by default (it already is in clusters brought up by kube-up.sh)
kube-state-metrics
Project page at https://github.com/kubernetes/kube-state-metrics
This is a relatively new project (Started on May 2016). It differs from all the other stuff up to now as it is supposed to be a simple service that basically aggregates the API servers metrics and exposes state by object type. Object types are grouped [8]. It exposes a prometheus compatible /metrics API endpoint (NOT TO BE CONFUSED with the same endpoint by the apiserver nor the kubelet, they are disjoint).
Overall this looks like something we could use at some point in time, but it's not immediately required. It provides metrics for high level overview of the state of the clusters. It is designed to become a source for heapster/metrics-server at some point in time.
Kubernetes Monitoring architecture
The overall (convoluted and difficult to understand IMHO) document [9]
It describes in, not so great clarity, 2 monitoring metrics pipelines.
- core metrics pipeline: It's Kubelets+metrics server+apiserver. This is the core, it's all about being used by core system stuff like scheduler and horizontal pod autoscaler. It should also be used by simple monitoring tools.
- monitoring pipeline: It's about exposing metrics to end-users (humans) as well as HPA (horizontal pod autoscaler) and Infrastore. There is no default implementation and the entire document is very vague and hypothetical about it. Diagram is https://raw.githubusercontent.com/kubernetes/community/master/contributors/design-proposals/instrumentation/monitoring_architecture.png. IMHO until there is something ready for it we should avoid meddling too much with implementing this.
It also divides metrics into system and service. service are explicitly defined in application code and exported by it. System are the generic ones that are available from every monitored entity (CPU, memory, IO etc). Practically anything not a service metrics is considered a system metric.
System metrics are subdivided in core and non-core metrics and this is where the document is ultra confusing as non-core include core. This needs a bit more reading.
One worrying part is the comment "Kubelet, providing per-node/pod/container usage information (the current cAdvisor that is part of Kubelet will be slimmed down to provide only core system metrics)", not sure what it means yet, seems like this was implemented in 1.7.0 and later partly reverted (in 1.7.4)
Overall the point of the document is to define the 2 pipelines and say kubernetes will try to provide a good implementation of the 1st one while leaving the second to interested parties. Looks like that's the part we will have to implement, using prometheus.
[1] https://github.com/google/cadvisor/blob/master/docs/web.md
[2] https://github.com/google/cadvisor/blob/master/docs/api.md
[3] https://github.com/google/cadvisor/blob/master/docs/storage/README.md
[4] https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml
[5] https://github.com/kubernetes/heapster/blob/master/docs/model.md
[6] https://github.com/kubernetes/heapster/blob/master/docs/sink-owners.md
[7] https://github.com/kubernetes/heapster/blob/master/docs/storage-schema.md
[8] https://github.com/kubernetes/kube-state-metrics/blob/master/Documentation
[9] https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md