Page MenuHomePhabricator

Investigate using camus offset files to start hive load job {hawk} [5 pts]
Closed, ResolvedPublic

Description

Camus outputs two data per topic and partition after each run:

  • offset - the kafka offset it reached
  • timestamp - the timestamp of the offset

We could use a timestamp analysis of the latest camus run to start the hive load job: when, for given topic, the max timestamp is higher than the a given hour + X mins for instance, then start.

Event Timeline

kevinator assigned this task to Ottomata.
kevinator raised the priority of this task from to Normal.
kevinator updated the task description. (Show Details)
kevinator moved this task from Incoming to Tasked on the Analytics-Backlog board.
JAllemandou renamed this task from Add percent_change to webrequest_sequence_stats_hourly {hawk} [5 pts] to Investigate using camus offset files to start hive load job {hawk} [5 pts].Sep 22 2015, 4:16 PM
JAllemandou claimed this task.
JAllemandou updated the task description. (Show Details)
JAllemandou set Security to None.
JAllemandou edited projects, added Analytics-Kanban; removed Analytics-Backlog.
JAllemandou moved this task from Next Up to Paused on the Analytics-Kanban board.
JAllemandou moved this task from Paused to In Progress on the Analytics-Kanban board.
kevinator closed this task as Resolved.Sep 29 2015, 5:43 PM

Change 240868 had a related patch set uploaded (by Joal):
Add refinery-camus module

https://gerrit.wikimedia.org/r/240868

Change 240868 merged by Ottomata:
Add camus helper functions and job

https://gerrit.wikimedia.org/r/240868