Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | elukey | T166833 Produce webrequests from varnishkafka to Kafka with Kafka message timestamp set to configurable content field | |||
Resolved | Ottomata | T152015 Provision new Kafka cluster(s) with security features | |||
Resolved | elukey | T168538 Perf test RAID vs JBOD with new hardware and kafka versions | |||
Resolved | elukey | T167992 rack/setup/install new kafka nodes kafka-jumbo100[1-6] | |||
Resolved | • Cmjohnson | T173837 kafka-jumbo1004 h/w problem most likely raid card | |||
Resolved | RobH | T174457 kafka-jumbo.cfg partman recipe creation/troubleshooting |
Event Timeline
Comment Actions
We have been using for kafka single disks, as kafka knows where to put topic partititions. If a disk fails the broker needs to be shut down. We want to measure what is the impact of having of having, say, RAID10. We will install RAID in 3 nodes and measure versus non raid 3 nodes.
Comment Actions
I had an interesting chat with the Ops team about this task and I believe that we don't need to spend ton of time working on this now:
- A kafka broker works appending data to the end of a file on disk, and usually consumers trigger sequential reads at the end of it. This should ensure, on paper, that disk cache is heavily used and disk is not hit that often (except when flush is forced to sync disk with new data).
- The kafka brokers seems not to have any paging or swap activity (pidstat confirmed it). Disk usage is on average 5%, IOPs are really low.
- Consumers activity is not concentrated in brief time windows (like consuming ton of data once every hour).
- RAID10 is a feature that will ease a lot the maintenance of our Jumbo Brokers, and I don't see any big concern that out-weight its benefits (among all: broker keeps working after a disk failure, no more constrained data directories/partitions that can be filled easily causing alerts).