Problem:
The storage drives on our centrallog instances are filling up rapidly, primarily due to the volume of logs with log level debug being generated by the prometheus-blackbox-exporter, thanos-fe-query, and thanos-query.
This behavior is expected because it's the level assigned in the code for:
- prometheus-blackbox-exporter log level configuration
- thanos-query log level configuration
- thanos-query-frontend log level configuration
Proposed Solutions:
- Reduce the log level to error.
- Remove logs older than 30 days.
Additionally, I think that option 2 can be implemented with two conditionals in mind, one being the free space left on the instance and the other one the time period.
Your input on this matter would be greatly appreciated.