Page MenuHomePhabricator

Orchestration
Open, Needs TriagePublic5 Estimated Story Points

Description

  • - Use airflow, with a dag:
  • - allow for both 1. Run WDTK → create CSVs → make aggregations pre-db → load data into DB → make aggregation-post-db
  • - allow for paraellized transformations (replace sql permuations in loop)
  • - see if templated tasks would work

[ x ] - see if wikipedia cloud has a spark cluster already?

Event Timeline

notconfusing created this task.
notconfusing set the point value for this task to 5.
notconfusing moved this task from In Progress to beta backlog (mvp) on the Humaniki board.
notconfusing updated the task description. (Show Details)Sep 28 2020, 6:20 PM