After we defined an implementation plan, we can go ahead and implement the prototype.
This would include:
- Sqooping of source data (if necessary)
- Finalize generated table schemas and create tables (see implementation plan)
- Computation of (denormalized?) generated datasets (SparkSql? SparkSubmit?)
- Finalize examples of queries to the generated data (tested), to extract Commons metrics (see implementation plan)