Page MenuHomePhabricator

eqiad/codfw: 1+1 Kafka broker in main clusters in eqiad and codfw
Closed, ResolvedPublic

Description

Labs Project Tested: deployment-prep
Site/Location:EQIAD & CODFW
Number of systems: 2
Service: EventBus
Networking Requirements: internal IP, not in row A or B
Memory: 32 GB
Disks: 4 4TB

Other Requirements:

The main Kafka clusters in eqiad and codfw each consist of 2 brokers. These systems are becoming more critical, and as such we would like to increase reliability confidence, as talked about in T144637. We'd like to increase the cluster size of these to 3 brokers, which means we need to add 1 broker in each datacenter.

These nodes should match exactly the kafka1001 and kafka2001 nodes ordered in T114191, which I believe were taken from the misc pool refresh procured in T118260. I believe these are PowerEdgeR430 with 4 (~)4TB drives, 16 (virtual?) cores, 32 GB RAM. I believe these are the nodes that were ordered in https://phabricator.wikimedia.org/F3031650. If there are still misc nodes available from this order, we'd gladly take them.

Event Timeline

Ottomata created this task.Sep 8 2016, 2:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 8 2016, 2:49 PM
RobH added subscribers: mark, RobH.Sep 8 2016, 6:57 PM

First off, that is easily one of the best damned requests ever (in terms of populated system info.) If everyone provided the full purchase history for each request it would make things so easy on me!

Thanks for that =]

System allocations:

We have a spare system for this in eqiad (5 in fact):

WMF4723 : Dell PoweEdge R430 : Dual Intel® Xeon® Processor E5- 2623 V3 : 32GB RAM : (4) 4TB SATA

Unfortunately, we do not have a similar system in codfw. I'll create a sub-task in procurement for quoting out this system. Once I have that quote in, I'll escalate both the quote and this allocation request to @mark for approvals.

RobH claimed this task.Sep 8 2016, 6:57 PM
RobH moved this task from Backlog to In Discussion / Review on the hardware-requests board.

Perfect, thank you!

RobH mentioned this in Unknown Object (Task).Sep 8 2016, 6:59 PM
RobH created subtask Unknown Object (Task).

These installs don't have to happen at the same time. Since there are eqiad spares, can we install one of these before we make the order for codfw?

Bump! I see that there are requests for quotes out, did we ever get them back?

RobH reassigned this task from RobH to mark.Oct 19 2016, 10:01 PM

First off, that is easily one of the best damned requests ever (in terms of populated system info.) If everyone provided the full purchase history for each request it would make things so easy on me!

Thanks for that =]

System allocations:

We have a spare system for this in eqiad (5 in fact):

WMF4723 : Dell PoweEdge R430 : Dual Intel® Xeon® Processor E5- 2623 V3 : 32GB RAM : (4) 4TB SATA

Unfortunately, we do not have a similar system in codfw. I'll create a sub-task in procurement for quoting out this system. Once I have that quote in, I'll escalate both the quote and this allocation request to @mark for approvals.

@mark: Is it ok for me to allocate spare system WMF4723 : Dell PoweEdge R430 : Dual Intel® Xeon® Processor E5- 2623 V3 : 32GB RAM : (4) 4TB SATA for this use?

We currently have 4 spare of these systems, along with 1 spare higher powered system.

Please comment and assign back to me for implementation. Thanks!

mark added a comment.Oct 20 2016, 4:31 PM

Instead of a one-off, should we get similar spares as eqiad in codfw?

@mark, this ticket is about that too. It just happens that there is already a spare that matches these specs in eqiad, so we can set that up while we wait for the codfw order to happen.

RobH added a comment.EditedOct 20 2016, 4:36 PM

Instead of a one-off, should we get similar spares as eqiad in codfw?

I can generate the quotes on T145112 from 1 system (for kafka) to multiple systems (for kafka and spare systems.) Please note that codfw has 4 spares that are 2.6GHz @ dual 1TB disks and 64GB of ram & no spares with 3GHz and quad 4TB disks (like is needed for this.)

So I'll get the price for the spares/kafka for codfw via the procurement task.

On this task, I still need approval to allocate the spare machine in eqiad for this use.

mark added a comment.Oct 21 2016, 11:53 AM

First off, that is easily one of the best damned requests ever (in terms of populated system info.) If everyone provided the full purchase history for each request it would make things so easy on me!

Thanks for that =]

System allocations:

We have a spare system for this in eqiad (5 in fact):

WMF4723 : Dell PoweEdge R430 : Dual Intel® Xeon® Processor E5- 2623 V3 : 32GB RAM : (4) 4TB SATA

Approved.

RobH changed the task status from Open to Stalled.Oct 21 2016, 4:56 PM
RobH claimed this task.

stealing this back for sub-task implementations (both orders and system allocations)

RobH added a comment.Nov 1 2016, 9:55 PM

The eqiad spare has been allocated and is handed off for use.

The codfw host has been ordered on task T145112. It has an ETA of 2016-11-11.

RobH closed this task as Resolved.Nov 14 2016, 6:21 PM

Allocation of codfw box took place on T150340. Resolving this hw-request.

mark closed subtask Unknown Object (Task) as Resolved.Nov 29 2016, 11:33 AM