Page MenuHomePhabricator

Data Infrastructure as a Service MVP
Closed, DuplicatePublic

Description

We have identified a need for a new storage platform in order to meet the needs of teams focused on building data-centric products and features.

The platform should offer both block storage and object storage capabilities and should be suitable for use with both analytics and production oriented workloads.

As such we are planning to build an MVP of such a platform in Q2 of the 2022/2023 financial year, with the Data-Engineering team taking primary responsibility for its design and implementation. Consultation and close collaboration with the Data Persistence and Infrastructure Foundations teams will be essential in order to ensure that the deisgn meets the requirements and that the expected traffic profile is compatible with our network topology.

The end goal is to build a scalable platform that can facilitate self-service data infrastructure provision across many teams and for a wide variety of requirements. The MVP should be designed in such a way that once its value has been proven it can be promoted to a production class service without a full rebuild.

Some key use cases include

  • the ability to support Persistent Volume Claims in Kubernetes, such that we can begin to deploy stateful services on k8s
  • the ability to provide block storage to virtual machines, for enhanced flexibility in designing data processing systems on VMs
  • the ability to provide S3 and/or Swift compatible object storage as a back-end for analytics and similar workloads

In this phase we are only looking at building this platform in eqiad, although we should always consider how it would scale to a multi-DC and/or cross-DC design.

Related Objects

StatusSubtypeAssignedTask
DuplicateNone
DuplicateNone
ResolvedBTullis
ResolvedBTullis
Resolved EChetty
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
Resolved EChetty
ResolvedBTullis
Resolved EChetty
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
Resolved EChetty
Resolved Cmjohnson

Event Timeline

BTullis renamed this task from Data Infrastructure as a Service platform MVP to Data Infrastructure as a Service MVP.May 13 2022, 3:05 PM

Adding to the Shared-Data-Infrastructure project to aid with planning.

The spec of the servers is coming along quite nicely in the procurement ticket: {T311869}

Each of the 5 Ceph servers is looking like it's going to be configured as follows:

Ceph Storage Layout.drawio.png (531×882 px, 33 KB)