Generate test data for one day with these dimensions:
- sub-cube: project, day/hour, agent type, pseudo-k anonymized with k = 100
- hourly data, diminishing resolution: project, dialect, article
Output is TSV on HDFS
- header could look like:
- dim1, dim2, dim3, ... , count
- A, B, null, ... , 120 (means for "all" dim3 values)