Page MenuHomePhabricator

Logging options for apache httpd in k8s
Open, HighPublic

Description

Let's start with the requirements:

  • We need to be able to tail/grep/search these logs with ease, for debugging purposes. Logstash is ok only if it can be fast and reliable (any lag would kill our ability to debug things while they happen)
  • We need to be able to run mtail on those logs
  • We produce, daily, around 330 GB of API access logs and 190 GB of website access logs, over all of our traffic

Solution 1

We create a directory on the k8s node that works as a hostpath in all apache containers, and we make apache write its logs there, with a filename depending on the pod name.
Mtail can then run as a DaemonSet parsing those logs.

Open issues:

  • Creates i/o on the k8s hosts
  • Log rotation would be somewhat challenging

Solution 2
We make apache send its logs to logstash and a central log server(s) by logging to a piped command

In this case, logs would not be persisted on the individual server, but sent out to a central syslog server where they'd be stored for N days (as I said, we need ~ 500 GB / day of uncompressed logs, which become ~ 100 GB/day for compressed logs. Data could also be sent to logstash (at least sampled) for further and easier analysis. Mtail would have to be run on the central log server.

Open issues:

  • might lose logs if the central server is not HA
  • needs piped logging, which is not always great for performance, and probably for us to implement a better logger than apache's own.
  • log retention

I see advantages to both approaches, but I'll like to hear from the observability folks what they think is the best solution.

Event Timeline

Joe triaged this task as High priority.Oct 19 2020, 8:50 AM
Joe created this task.
Joe added a project: observability.
Joe added a comment.Oct 19 2020, 9:07 AM

Additional datapoint that was required: we should be sending ~ 10/15k messages per second to the central log server, depending on traffic.

lmata added a subscriber: lmata.Oct 19 2020, 3:09 PM

Couple of points

We create a directory on the k8s node that works as a hostpath in all apache containers, and we make apache write its logs there, with a filename depending on the pod name.

Another con is that unless we can rely on an env var we might have to change on-disk the configuration of apache per pod, which means every config will become a snowflake as far as hashing functions like md5 or such goes. It will also require that we effectively do mutable images (and we might want to avoid that).

Mtail can then run as a DaemonSet parsing those logs.

Or just via puppet, at least in the beginning, which should make the migration easier.

Log rotation would be somewhat challenging

Yes, quite a bit given that the node will have to inform all the apaches in all the pods that it needs to logrotate their logs.

  • Let me add that we 'll probably need to repartition k8s nodes for Solution 1 as the bulk of the disk is given to container and not to the host / fs or any dedicated log fs.

Creates i/o on the k8s hosts

Indeed, but it might or might not be an issue. We 'll need some numbers on that. Intuitively I don't think that it would become an issue, but it makes sense to keep an eye out for that.

The interesting question is how much of that will be a problem in Solution 2 as the centralization is going to just exacerbate that.

There is also a solution 3 where we hybrid between solution 1 and solution 3. A node level daemon that just just waits for log messages on a unix socket, with that socket being bind mounted to pods and apache writing to it. That component can sample if wanted and route where needed. Interestingly we got already a component that can do routing and throttling with message dropping, fluent-bit.

lmata added a comment.Tue, Nov 3, 5:51 PM

Just dropping a quick update here, we should schedule some time to review options. Had a brief exchange with @akosiaris and we'll get the team together for a discussion on proposed paths and collaboration.