(not presto: json/sql dump)
Make edit data lake data available as a snapshot on dump hosts that can be sourced by Presto
In order to make edit data on Data Lake available to cloud users we need to be able to load presto (sql data store, baacked by HDFS) with that data. We do not want to be opening holes between the presto cluster and prod (where data is generated) so the data will be rsync-ed from prod into the dumps endpoint. Note that there is no issue with privacy here, the edit data on data lake is public and we are just using our hadoop cluster in prod to denormalize it and make it useful for clod users.