Page MenuHomePhabricator

Standardize on the "default" pod setup
Closed, ResolvedPublic

Description

We would like to have good logging and monitoring/alerting from day 1 on our kubernetes based services. For this to happen we should figure out a way to do this without impending the developer much. There are a number of approaches on this one, from enforcing using specific preconfigured frameworks like service-runner, to creating sidekick containers that run alongside the "main" container (but in the same pod) and e.g. consume the main containers stdout for logging and poll it for statistics gathering. There is also the issue of encryption. In an ideal world, our pods would be able to terminate TLS as well as have all outoing connections encrypted. That may or may not be possible directly on the main container and may or may not make sense to be implemented via a sidekick container.

This task is about discussing our approaches, picking the best one and implementing it.

Event Timeline

As far as logging goes, we have basically two big options:

  • Run a log collector as a sidecar. For this to work, all logs from the application must be sent to a specific port using e.g. the syslog format (which has some limitations itself). The log collector sidecar will have the ability to then forward the logs to ELK, and to write them to disk (that will need the container to have a bind mount of some directory under /var/log)
  • Leave most applications unchanged (that is, logging to stdout/stderr as with systemd), let docker intercept those lines to file (EWWW) and run a DaemonSet with the log collector watching /var/log/containers
  • Combine both of the above approaches, allowing a smooth transition for the applications.

I am inclined to favour the first option, which would give us more control over how to treat logs.

It seems like fluentd is the most common choice, and from what I see it's fairly feature complete. There is no official deb package of it though, which is a notable inconvenience.

Things still to figure out:

  • Which format to use for log ingestion? Syslog? What else?
  • Which parts of the collector configuration do we want to expose to the user for tuning? How to do that - env variables or allow plain configmaps?
  • Make the same setup work for staging and production
  • Which log collector daemon to pick? Fluentd being an important option.

For metrics collection, my proposal (after a chat with @fgiunchedi) have another sidecar running prometheus-statsd-exporter in the modifed version we maintain.

This will allow us to:

  • Expose the pod's metrics of choice to prometheus, if an appropriate ConfigMap is provided (a default one will be provided otherwise)
  • Still relay the data to our main statsd instance

In this case, it can be that on the long run we want to automate the creation of the configuration for the exporter, but I'd prefer to let the developer have large flexibility about which metrics they want to collect here, as most metrics opsens are interested in (response codes, endpoint timing) will be collected as telemetry by the TLS terminator (see next comment).

A containerized microservice environment should make developing and deploying applications as easy as possible.

Given this goal, it is a good idea to abstract from the single microservice the logical layers that can be implemented separately, namely:

  • Identity
  • Authn/authz
  • Encryption of communications
  • Discovery/request routing

there are several projects trying to solve this problem partially, but the most promising of such projects is probably https://istio.io . This integrated solution solves all of the above providing an automatic PKI infrastructure, an authorization framework, a telemetry collector and mesh discovery / request routing. The framework has many more capabilities, but the practical implementation is done adding to each microservice pod a sidecar containing a modified version of Envoy, a fast and lightweight HTTP 1.1/2.0 proxy, and a bunch of central services.

While I think this should be available as soon as possible, implementation of this additional sidecar might come in a subsequent phase than the other sidecars described here.

mobrovac subscribed.

+1 on decoupling these concerns from the the running services. This model would allow developers to concentrate solely on their service's functionality and would also decouple the configuration of the service itself from auxiliary facilities (like where to send logs, metrics, handle auth(n|z), etc).

Concerning Node.JS services and logging, service-runner-backed services emit JSON strings in [bunyan format](https://github.com/trentm/node-bunyan#log-record-fields) and can be output to stdout / stderr, syslog, a file or over UDP/TCP. FluentD seems like a powerful alternative to rsyslog, and there is also a plugin [to make it understand bunyan's format](https://github.com/bodhi-space/fluent-plugin-jsonish#nodejs_bunyan). It also seems like a good fit because of the possibility of writing our own plugins in case some service uses a non-standard way of logging, thus easing the transition to k8s.

On the metrics side, we standardised on the StatsD format, but +1 on using Prometheus. We might want to use Prometheus directly from Node.JS (more involved) or simply use the StatsD exporter for Prometheus.

I don't have strong views on how to scale metrics and log collection. In any case, we have been doing this remotely for a while now (using standard formats like gelf for logs), so whether things are aggregated per pod or more centrally doesn't make a big difference to the services themselves.

I am a bit more concerned about performance and reliability implications of adding indirections in the data path itself. TLS is supported by all major platforms we use, so we should be able to avoid indirections for that. The main requirement to enable this is centralized certificate management, and exposing certs to services in a standardized manner, often via env vars.

On the metrics side, we standardised on the StatsD format, but +1 on using Prometheus. We might want to use Prometheus directly from Node.JS (more involved) or simply use the StatsD exporter for Prometheus.

+1 on statsd_exporter for existing services to ease the transition, note however that the statsd -> prometheus metrics translation can't be automatic, thus a mapping between the two will need to be maintained by the service as outlined here: https://github.com/prometheus/statsd_exporter#metric-mapping-and-configuration . Long term I think a prometheus client in the service will be more maintainable. Only slightly related to the discussion but http://swaggerstats.io/ might come handy too in certain cases.

I am a bit more concerned about performance and reliability implications of adding indirections in the data path itself. TLS is supported by all major platforms we use, so we should be able to avoid indirections for that. The main requirement to enable this is centralized certificate management, and exposing certs to services in a standardized manner, often via env vars.

Envoy (the component istio uses for proxying) has been built expressly with low latencies and reliability in mind. Advantages like automatic telemetry of every endpoint, auto-discovery of other services, automatic management/rotation of TLS credentials are more than a nice plus.

Add to this the fact we'd have one single, and modern, TLS stack to care about/tune instead of having the need to check settings for several different platforms, and I think this is a net gain.

If you want you can read more about Envoy here https://lyft.github.io/envoy/ and about its use in istio here: https://istio.io/docs/concepts/traffic-management/overview.html

Even if we decide not to go with a full blown adoption of istio, having a TLS terminator/telemetry collector like envoy still has value by standardizing the environment across languages/platforms

FTR +1. Let's take a good look at envoy

We've built the containers for logging and metrics reporting, and designed how to unclude them into the standard pod setup.

While I consider the part of this ticket relevant to the Ops Q1 goal solved, we still have parts that we need to work on, specifically:

  • A helm manifest template that includes these manifests in a standard pod for production and staging, and not in development
  • Envoy and/or istio, which are a huge separated topic.