Page MenuHomePhabricator

Draft initial data storage platform and place budget hold for Q2
Closed, ResolvedPublic

Description

We need to start designing this platform in order to acquire suitable hardware for an MVP in Q2.

The design that I have in mind is similar to that set out in Common Data Infrastructure - (Currently a restricted slide deck)

The key details are:

  • A five node Ceph cluster - located in rows E and F of eqiad
  • Each node has two 25 Gbps network connections to the ToR switches - one for intra-cluster replication
  • Ceph monitor processes are co-located with the object storage daemons (OSDs)
  • Ceph rados gateway (RGWs) are also co-located with the OSDs
  • The five OSD nodes are 2U servers with 24 NVMe slots in the chassis
  • An optional hot tier of storage can be provisioned some using the remaining NVMe slots
  • A cold tier of storage will be provisioned from HDDs running in directly connected JBOD
  • The Bluestore WAL and DB devices will also be located on NVMe drives

I will begin defining a hardware spec and estimating the cost.

Event Timeline

BTullis triaged this task as High priority.May 13 2022, 1:03 PM
BTullis created this task.
BTullis moved this task from Next Up to In Progress on the Data-Engineering-Kanban board.

Storage node spec

Head Node

The five servers would be a custom build since they do not closely match an existing entry from the standard HW spec.

  • Dell PowerEdge R740xd chassis - with 24 NVMe capable front bays
  • 2 x 25 Gbps network interface card option
  • Mid-tier CPU processing power such as: 2 x Intel® Xeon® Silver 4210R 2.4G, 10C/20T
  • 192 GB of RAM
  • O/S installed to two internal M.2 480 GB drives (BOSS - Boot Optimized Storage Subsystem)
  • Internal H330 Host Bus Adapter - instead of a RAID controller
  • External H840 Host Bus Adapter - for connection to JBOD tiers

JBOD Chassis

Each of the five servers requires

  • 1 x PowerVault MD1400 enclosure - 12 x SAS connected 3.5" hot-swap drive bays

Up to 8 of these JBODs can be daisy-chained from each storage server, allowing for future scale-out of the cold-storage tier.

Storage Devices

The following is only an example of a storage population.

Each of the five servers requires

  • 2 x 480 GB NVMe devices to act as Bluestore WAL/DB

A tier of cold storage:

  • 12 x 18 TB SAS 3.5" hard drives

An optional hot tier of NVMe drives, for example:

  • 10 x 3.8 TB U.2 2.5" NVMe solid-state drives

MVP Cluster Capacity

The configuration set out above would give a total capacity of:

50 x 3.8 = 190 TB for the hot tier
60 x 18 = 1.08 PB for the cold tier

Simplistically we could say store 3 copies of everything on the cold tier and 2 copies of everything on the hot tier, which would give:

  • 90 TB usable for the hot tier
  • 360 TB usable for the cold tier

In reality we can achieve better utilization than this by using erasure coded pools instead of replicated pools. However, the calculation above for replicated pools might be enough as a baseline estimate.

Without having sought a quote from Dell on this, all I can do is estimate the costs.

My estimate is:

  • Head node = $8,000 each
  • JBOD exclosure = $4,000 each
  • Cold tier (60 drives) = $24,000 total
  • Hot tier (50 drives) = $50,000 total

Total (with hot tier) = $134,000
Total (without hot tier) = $84,000

This MVP will hopefully run Q1 ready to be used in Q2

Moving to in-review while we share the draft spec and invite comments.

I'm moving this to Ops on the board, because it didn't come from another team.
It is something that we have decided to do ourselves and it's about building infrastructure so I think it belongs in Ops. Hope that's OK.

I have completed a draft of the design document for this MVP.

Design Document - Common Data Infrastructure MVP

I'd welcome feedback from anyone on the document. There are a couple of places where I am actively seeking input from the Infrastructure-Foundations team and others about how we should best configure the network.

Looks great! I added a comment asking for how this (in the future) could support Shared Data Platform Multi DC for prod.

Looks great! I added a comment asking for how this (in the future) could support Shared Data Platform Multi DC for prod.

Thanks for the feedback. Yes I'll add that section. It definitely makes sense.

RobH mentioned this in Unknown Object (Task).Jul 1 2022, 4:44 PM