Page MenuHomePhabricator

Data Catalog Technical Evaluation
Open, Needs TriagePublic

Description

This task is part of the Daas OKR1, and includes evaluating OSS catalog solutions against the requirements for the WMF data catalog.

We initially looked for any option with an OSI-approved license, and documented these in the rubric mentioned below. We narrowed down this list to four candidates that could work for us. For each of these, we installed it and attempted to connect it to at least our Hive metastore. Details of main candidate evaluation:

Rubric for this evaluation available at: https://wikitech.wikimedia.org/wiki/Data_Catalog_Application_Evaluation_Rubric

Event Timeline

@BTullis perhaps you already saw but @Milimetric considered CKAN on the rubric and ultimately disqualified it:

CKAN is meant to work at a very large scale, governments with multiple branches collaborating on data hubs. As such, most of the integrations are meant to be done manually, with only minimal automation support. Details can be found in their docs, but it doesn't seem to meet our requirements, it's maybe something to consider for something bigger like an Open Knowledge Data Portal shared with our other open knowledge partners.