Page MenuHomePhabricator

Spike: Evaluate datahub schema versioning support
Closed, ResolvedPublic2 Estimated Story Points

Description

Spike Goal
Determine what the user experience is for versioning schemas as data changes (Test on Hive, then to event platform)

Key Questions:

  • What happens with breaking Schema changes?

DataHub tracks schema versions in accordance with SemVer, autmatically creating change events. These are available from the Timeline API.

  • Can we highlight Backwards compatible changes?

DataHub automatically does this with minor SemVer version increments.

  • Can we highlight Backwards incompatible changes?

DataHub automatically does this with major SemVer version increments.

  • What happens when we Upgrade datahub?

On upgrading from 0.8.28 to 0.8.32 and then subsequently 0.8.34, we observed no breakage in the history of schemas, all versioning was kept.

Related Objects

Event Timeline

EChetty set the point value for this task to 2.

This looks to be available behind the scenes but just not surfaced in the UI yet? https://datahubproject.io/docs/dev-guides/timeline/

EChetty moved this task from Next Up to MVP on the Data-Catalog board.
Milimetric triaged this task as High priority.

I will work on this first, using my hive database, milimetric, and reporting findings here.

BTullis subscribed.

We need to upgrade to DataHub 0.8.34 before this will work. I've added that ticket as a parent of this one.

I have upgraded DataHub to version 0.8.34 and now the Blame View is accessible, which is the first way to visualize schema changes over time.

image.png (752×1 px, 75 KB)

https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,milimetric.test_ingestion,PROD)/Schema?is_lineage_mode=false

EChetty moved this task from In Review to Done on the Data-Catalog board.