Page MenuHomePhabricator

Make edit data lake data available as a snapshot on dump hosts
Closed, DuplicatePublic

Description

(not presto: json/sql dump)

Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto

In order to make edit data on Data Lake available to cloud users we need to be able to load presto (sql data store, baacked by HDFS) with that data. We do not want to be opening holes between the presto cluster and prod (where data is generated) so the data will be rsync-ed from prod into the dumps endpoint. Note that there is no issue with privacy here, the edit data on data lake is public and we are just using our hadoop cluster in prod to denormalize it and make it useful for clod users.

Event Timeline

Nuria triaged this task as High priority.Jan 17 2019, 2:30 PM
Nuria created this task.
Milimetric moved this task from Next Up to Paused on the Analytics-Kanban board.
fdans renamed this task from Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto to Make edit data lake data available as a snapshot on dump hosts.Mar 7 2019, 5:47 PM
fdans updated the task description. (Show Details)
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
mforns lowered the priority of this task from High to Normal.Mar 11 2019, 3:39 PM