Page MenuHomePhabricator

Monitor and alert if no new data from JsonRefine jobs
Closed, ResolvedPublic5 Story Points

Event Timeline

Ottomata triaged this task as High priority.Feb 6 2018, 2:59 PM
Ottomata created this task.

Current idea is to use Spark Accumulators to collect stats about jobs as they go, write them to a hive table, and then generate an alert email when there are problems.

For each JSONRefine job, we should collect a table that has:

  • database name
  • table info
  • source path
  • destination path
  • record count
  • partition info (as fields or as single string?)

maybe also:

  • start_dt
  • success_dt
  • failure_dt

Perhaps if we partition by database/table/<all_table_partitions, we can update/replace rows if we re-run jobs.

CC @JAllemandou

After more thoughts, looks like the current need only needs to cron-check tha new data flows in regularly and email if not.
Accumulators and reports of execution might come in a second round (after using spark2 and having better understood some of its benefits)

Change 413633 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] [WIP] Add RefineMonitor job

https://gerrit.wikimedia.org/r/413633

Change 413633 merged by jenkins-bot:
[analytics/refinery/source@master] Add RefineMonitor job

https://gerrit.wikimedia.org/r/413633

Change 417287 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] Apply geocode, deduplicate and monitoring for refine jobs

https://gerrit.wikimedia.org/r/417287

Change 417287 merged by Ottomata:
[operations/puppet@production] Apply geocode, deduplicate and monitoring for refine jobs

https://gerrit.wikimedia.org/r/417287

Nuria closed this task as Resolved.Mar 28 2018, 3:57 PM