|Resolved||Ottomata||T186130 Hive EventLogging tables not updating since January 26|
|Resolved||Ottomata||T186602 Monitor and alert if no new data from JsonRefine jobs|
Current idea is to use Spark Accumulators to collect stats about jobs as they go, write them to a hive table, and then generate an alert email when there are problems.
For each JSONRefine job, we should collect a table that has:
- database name
- table info
- source path
- destination path
- record count
- partition info (as fields or as single string?)
Perhaps if we partition by database/table/<all_table_partitions, we can update/replace rows if we re-run jobs.
After more thoughts, looks like the current need only needs to cron-check tha new data flows in regularly and email if not.
Accumulators and reports of execution might come in a second round (after using spark2 and having better understood some of its benefits)