Page MenuHomePhabricator

Modify Camus file commits to use OVERWRITE option when renaming into final destination
Closed, ResolvedPublic5 Estimated Story Points

Description

If Camus's offset files get wonky, it may have to re run a job and reimport data from Kafka. In this case, it may need to overwrite files in the final destination. However, currently if a file in the final destination already exists, and the rename() fails, the whole Camus job will fail, not update history offsets, and get stuck.

We should modify the camus commit to allow it to overwrite files. It looks like switching the [[ https://github.com/wikimedia/analytics-camus/blob/wmf/camus-etl-kafka/src/main/java/com/linkedin/camus/etl/kafka/mapred/EtlMultiOutputCommitter.java#L153 | rename() call in EltMultiOutputCommiter commitFile ]] from [[ https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html#rename(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.Path) | FileSystem ]] to [[ http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileContext.html#rename(org.apache.hadoop.fs.Path, org.apache.hadoop.fs.Path, org.apache.hadoop.fs.Options.Rename...) | FileContext ]] and using the OVERWRITE option should do it.

Event Timeline

JAllemandou edited projects, added Analytics-Kanban; removed Analytics.

Change 274736 had a related patch set uploaded (by Joal):
Add destination overwriting capacity to camus

https://gerrit.wikimedia.org/r/274736

Milimetric set the point value for this task to 5.Mar 3 2016, 5:12 PM

Change 274736 merged by Ottomata:
Add destination overwriting capacity to camus

https://gerrit.wikimedia.org/r/274736

Change 275047 had a related patch set uploaded (by Ottomata):
Add destination overwriting capacity to camus

https://gerrit.wikimedia.org/r/275047

Change 275047 merged by Ottomata:
Add destination overwriting capacity to camus

https://gerrit.wikimedia.org/r/275047

Change 276165 had a related patch set uploaded (by Joal):
Make camus overwrite exisiting files

https://gerrit.wikimedia.org/r/276165

Change 276165 merged by Ottomata:
Make camus overwrite exisiting files

https://gerrit.wikimedia.org/r/276165