The queries are already existing in oozie coordinator definitions: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/coord_editors_bycountry_monthly.properties#L117
We wish, for each coordinator defined in https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra, to have an HQL file, organized by loading time granularity:
hql/cassandra/GRANULARITY/load_cassandra_DATASET_GRANULARITY.hql
Some minor changes need to happen to the queries:
- Remove the end-of-line backslash
- Check if the query contains backslashes, as they would have been escaped and needs to be de-escaped :)
- Add the INSERT INTO ${destination_table} clause and the /*+ COALESCE(${}) */ hint at the latest select
- Parameterize values for
- source table
- destination table
- dates
- coalesce_partitions
- Add minimal parameters :)