Change load job to fail/continue based on webrequest_sequence_stats_hourly: percamus most recent change week over week {hawk}run offset files.
Difficulty: to read hive output and output into a file. Empty could mean success and some data means "fail"
1. Create subworkflow with a hive query as a parameter. The subworkflow runs the hive query in a shell action in order to interpret the result. The subworkflow will succeed if the query returns no result, and fail otherwise.
2. if success, oozie writes a 'success' file for this hourly partition: _GOOD (terrible name, we'll figure out a better name)
3. Make refine job depend on _GOOD file instead of _PARTITIONED fileThe job reading and computing offsets from camus history files will touch the _PARTITIONED file to manifest the fact that the partition is (normally) fully imported.
The current oozie load job needs to be changed to make use of this file to start, and use a new file (_CHECKED?) for statistics checking.