Page MenuHomePhabricator

Perf test RAID vs JBOD with new hardware and kafka versions
Closed, ResolvedPublic5 Estimated Story Points

Event Timeline

We have been using for kafka single disks, as kafka knows where to put topic partititions. If a disk fails the broker needs to be shut down. We want to measure what is the impact of having of having, say, RAID10. We will install RAID in 3 nodes and measure versus non raid 3 nodes.

I had an interesting chat with the Ops team about this task and I believe that we don't need to spend ton of time working on this now:

  1. A kafka broker works appending data to the end of a file on disk, and usually consumers trigger sequential reads at the end of it. This should ensure, on paper, that disk cache is heavily used and disk is not hit that often (except when flush is forced to sync disk with new data).
  1. The kafka brokers seems not to have any paging or swap activity (pidstat confirmed it). Disk usage is on average 5%, IOPs are really low.
  1. Consumers activity is not concentrated in brief time windows (like consuming ton of data once every hour).
  1. RAID10 is a feature that will ease a lot the maintenance of our Jumbo Brokers, and I don't see any big concern that out-weight its benefits (among all: broker keeps working after a disk failure, no more constrained data directories/partitions that can be filled easily causing alerts).
elukey set the point value for this task to 8.Aug 1 2017, 3:36 PM
Nuria changed the point value for this task from 8 to 5.