Page MenuHomePhabricator

Create cassandra loading HQL files from their oozie definition
Closed, ResolvedPublic5 Estimated Story Points

Description

The queries are already existing in oozie coordinator definitions: https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra/coord_editors_bycountry_monthly.properties#L117

We wish, for each coordinator defined in https://github.com/wikimedia/analytics-refinery/blob/master/oozie/cassandra, to have an HQL file, organized by loading time granularity:
hql/cassandra/GRANULARITY/load_cassandra_DATASET_GRANULARITY.hql

Some minor changes need to happen to the queries:

  • Remove the end-of-line backslash
  • Check if the query contains backslashes, as they would have been escaped and needs to be de-escaped :)
  • Add the INSERT INTO ${destination_table} clause and the /*+ COALESCE(${}) */ hint at the latest select
  • Parameterize values for
    • source table
    • destination table
    • dates
    • coalesce_partitions
  • Add minimal parameters :)

Event Timeline

JAllemandou created this task.
EChetty set the point value for this task to 3.Jun 30 2022, 4:43 PM

Change 828518 had a related patch set uploaded (by Joal; author: Joal):

[analytics/refinery@master] Update cassandra hql loading file

https://gerrit.wikimedia.org/r/828518

Change 828518 merged by Joal:

[analytics/refinery@master] Update cassandra hql loading file

https://gerrit.wikimedia.org/r/828518

EChetty changed the point value for this task from 3 to 5.