Page MenuHomePhabricator

Data Catalog MVP
Closed, ResolvedPublic

Description

Goal: :
Validate if having a Data Catalog will improve the discoverability and usability of our data.

Task One Liner:
Deploy a single data catalog solution hosting the Hive Metastore & Event Platform to create an active inventory of the data assets.

Functional Requirements
[Primary] Searching and filtering options to allow users to quickly find relevant sets of data for analytics or data engineering requirements.

Technical Requirements:
[Primary] Have the complete Hive Metastore imported into the Data Catalog

Product One Pager:
Here

Related Objects

StatusSubtypeAssignedTask
ResolvedBTullis
Resolved razzi
Resolved razzi
ResolvedBTullis
ResolvedBTullis
DeclinedNone
ResolvedBTullis
ResolvedBTullis
ResolvedJMeybohm
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
Resolved razzi
Resolvedakosiaris
Resolved razzi
ResolvedBTullis
ResolvedStevemunene
ResolvedBTullis
ResolvedBTullis
ResolvedBTullis
OpenNone
ResolvedBTullis
ResolvedMilimetric
DuplicateNone
DeclinedNone

Event Timeline

BTullis moved this task from Backlog to In Progress on the Data-Catalog board.

Here is the design document for this MVP deployment.

BTullis triaged this task as High priority.Mar 2 2022, 10:39 AM

Change 784726 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/deployment-charts@master] Disable analytics for datahub

https://gerrit.wikimedia.org/r/784726

Change 784726 merged by jenkins-bot:

[operations/deployment-charts@master] Disable analytics for datahub

https://gerrit.wikimedia.org/r/784726

BTullis moved this task from Epics to Done on the Data-Platform-SRE board.

I'm going to be bold and resolve this epic.
We have completed all of the tasks that were originally defined to get DataHub to an MVP stage.
There are improvements to be done to make it a production quality service, but these can be handle under a different ticket.