Context https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Data_integrity
A potential fix could be to investigate the possibility of using RAID even for broker's partition logs and configure it during the next cluster installation (for example, for the Kafka 0.9 migration).
Short term fixes might include having SMART icinga alerts on our IRC Channel.