Problem statement
Currently the logs for Cumin and Spicerack on the cumin hosts don't go to logstash because they might contain sensitive information and for that to happen T213902 needs to be implemented first. Even with that, the retention on logstash is usually short.
I think it's important to not loose all the information that we gather when running cookbooks and cumin commands when we reimage one of those hosts, or refresh it or if it has a fatal hardware failure.
Currently at each reimage/refresh we've either lost the data or it was manually copied over (when we remembered to do that), but is error prone and can be forgotten.
They can be useful in various different ways, later audit or debugging of what happen earlier, even gathering some statistics (although some basic ones could be gathered from SAL).
For how long to keep them, that's a good question to be answered. I think we need more than 3 months and potentially few years, but maybe that's not needed.
Stats about the data
- Paths:
- /var/log/spicerack
- /var/log/cumin
- Hosts: 3. The hosts with cluster::management and cluster::unprivmanagement roles.
- Sizes: the sizes are very different between hosts because people tend to use the eqiad host more. Given the recent reimage of cumin1001 I can't give very precise estimates.
- /var/log/spicerack: cumin1001 -> 1.4GB/year ; cumin2002 -> ~200MB/year
- /var/log/cumin: <20MB/year/host with current settings (I'm thinking of logging always at debug level in a separate file, that might increase the sizes, but we're talking about small things anyway)
- Retention: ideally few years, but at least longer than our standard 3 months.
- Peculiarities: because those are log files, they are append only in nature, but also they get rotated by the application (not logrotate) and they are just rotated, never deleted. The rotation can make a backup approach inefficient because what's now foo.log.1 tomorrow will be foo.log.2.
Possible approaches
- Wait for T213902 and ask for a longer retention there. Is there a known ETA?
- Backup the data in bacula, either as a temporary or permanent solution
- ... add other possible approaches