Page MenuHomePhabricator

Investigate methods to rate-limit/discard excessive log messages closer to the producer
Open, Needs TriagePublic

Description

Today we have some rate limiting in place in logstash where messages exceeding a threshold are dropped, however in cases of high load (e.g. a significant spike in error messages) the system can still become overwhelmed and experience ingest delays (kafka consumer lag)

To help reduce the impact of error spikes let's look into the options we have available today for preventing excessive log messages from entering the kafka-logging queue in the first place.

One option worth evaluating is rate limiting and discarding messages above a threshold at the log agent (currently rsyslog) before the messages are output to kafka-logging. More specifically something like https://rsyslog.readthedocs.io/en/latest/concepts/queues.html#discarding-messages for starters