Page MenuHomePhabricator

Create a wmf.webrequest_sampled_128 Hive table
Open, Needs TriagePublic

Description

We also download the 1:128 webrequest sampled data on the hadoop cluster, but we have not created a table of it. Could be worth it.

This should simple enough: create a Hive table, and add an airflow dag similar to webrequest refine that loads a Parquet backed Hive table version.

Event Timeline

Or wait for "webrequest 2.0" iceberg? @mforns?