We also download the 1:128 webrequest sampled data on the hadoop cluster, but we have not created a table of it. Could be worth it.
This should simple enough: create a Hive table, and add an airflow dag similar to webrequest refine that loads a Parquet backed Hive table version.