Page MenuHomePhabricator

Switch k8s logs to their own kafka topics
Open, Needs TriagePublic

Description

Logs for all k8s clusters currently flow into rsyslog-* kafka topics (split by severity). While this is simple and works under normal circumstances, in case of even a single spammy producer then all other producers are affected by the caused lag.

Similarly to what we do with prometheus, we should instead switch to a model where kafka-logging topics are isolated/split at least by k8s cluster, if not even more (e.g. cluster + namespace).

As a bonus side effect, moving to this model also effectively will increase the logstash ingestion capacity since we will be able to consume from more topics concurrently, as opposed to a single funnel/topic. Also at the moment normally we have 6 partitions and 6 logstash consumers, so effectively each consumes single-thread from a given topic.

Event Timeline

Change #1040170 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] k8s: send logs to per-cluster kafka topics

https://gerrit.wikimedia.org/r/1040170

Change #1040170 merged by Filippo Giunchedi:

[operations/puppet@production] k8s: send logs to per-cluster kafka topics

https://gerrit.wikimedia.org/r/1040170

Change #1042917 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] logstash: add auto_offset_reset to kafka input

https://gerrit.wikimedia.org/r/1042917

Change #1042918 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] logstash: consume k8s logs topics

https://gerrit.wikimedia.org/r/1042918