Page MenuHomePhabricator

SSDs for main Kafka clusters
Closed, DeclinedPublic

Description

Alright! We've got a little extra hardware budget to use up before the end of the FY.

The main Kafka clusters in eqiad and codfw are both about to get a bit more usage as part of T157088. We don't believe that the increase in usage is enough to warrant buying new nodes, but it is possible that the behavior of message consumers may change that could cause messages to be read from disk, rather than from page cache. In anticipation, we'd like to use the hardware budget to replace the HDDs in kafka[12]00[123].

These kafka nodes were procured in T145082 and T114191, and currently have 4 x 4TB HDDs. We'd like to replace these HDDs with large capacity SSDs. The larger we can get the better, but they obviously won't be 4TB.

This ticket is to get a quote for 24 large capacity SSDs for installation in the 6 main kafka brokers (3 in eqiad, 3 in codfw).

We understand that there is an industry shortage of SSDs, and have our fingers crossed that we could get these in time to use for the budget remainder. That means we'd need to have these delivered by the end of June!

Event Timeline

Ottomata created this task.May 25 2017, 8:22 PM
Restricted Application added projects: Operations, Analytics. · View Herald TranscriptMay 25 2017, 8:22 PM
RobH added a comment.May 25 2017, 8:39 PM

SSDs in this timeline isn't possible, not if we want them under warranty with the system vendor .

RobH assigned this task to Ottomata.May 25 2017, 8:40 PM

Is this something that you want done in next years budget, or is it now invalid? Please advise.

Nuria moved this task from Incoming to Radar on the Analytics board.May 29 2017, 3:26 PM

Is this something that you want done in next years budget, or is it now invalid? Please advise.

Heh, we will need it next FY but as @Ottomata points out in the desc, there is some extra $$$ left from this year's hardware budget. Can these be paid for by the end of the FY and received afterwards once available?

Apparently the stuff has to be actually received at the datacenter for it to count towards this year's budget.

Well, that's unfortunate. We definitely want them under warranty as it's an important production system. We still want them, so I guess we'll have to figure it out for next FY.

Do we still want to do this?

RobH added a comment.EditedFeb 14 2018, 6:20 PM

Can you guys provide me with the exact hostnames of the kafka hostnames you want upgraded? I see quite a few, and the hostnames of kafka and analytics systems don't tend to match the same standards as the rest. I rather not assume!

@RobH I am talking with Faidon about this right now for budgeting next FY. I think we are not going to add SSDs, but instead, get a couple more nodes and increase RAM in these.

But, anyway, the nodes are kafka[12]00[123]

For reference, the Kafka cluster nodes are defined in Puppet in hieradata/common.yaml

RobH added a comment.Feb 14 2018, 6:31 PM

So it looks like kafka[12]00[123] are all misc systems with 4 * 4TB LFF hot swap bays. Those cannot easily be converted to SFF, since Dell doesn't sell a drive tray that will fit an SFF in an LFF hot swap bay.

This means that these aren't really feasible to upgrade to SSD.

IRC Update: It seems we'll likely quote out 2 more nodes per DC with SSDs rather than the HDD used in previous kafka systems.

I think this means this task can be declined?

Ottomata closed this task as Declined.Feb 14 2018, 6:32 PM

Ya, sounds good. (probably no SSDs for future nodes, FYI)