While setting up the first Oozie job that will go in the new analytics/wmf-products/jobs repo (T261953), @JAllemandou pointed out that we should set up a system that deploys each new version of the job files in a new timestamped folder in HDFS (as Analytics already does for their jobs). This is important because if we instead overwrite the job files, this could break any in-progress jobs.
Description
Details
Related Objects
- Mentioned In
- T277781: Hive table neilpquinn.toledo_pageviews missing almost all data
rAWPJbe17fafe52e8: Improve documentation on deploying Oozie jobs
rAWPJ450e5dc2249d: Set up and document deployment strategy for jobs
T271420: Set up a system for team-managed command-line jobs
T271326: Update the external automatic translation Oozie job - Mentioned Here
- T261953: Set up an Oozie job to count Wikipedia Preview requests and clickthroughs in the webrequest logs
Event Timeline
Change 651794 had a related patch set uploaded (by Neil P. Quinn-WMF; owner: Neil P. Quinn-WMF):
[analytics/wmf-product/jobs@master] Set up and document deployment strategy for jobs
Across Product Analytics, we've identified a fairly significant need for Oozie jobs, so work like this to streamline use of them is now more useful. (On the other hand, it looks like Analytics Engineering has a definite plan to switch to Airflow at some point, but we don't have a lot of information on that yet.)
Change 651794 merged by Neil P. Quinn-WMF:
[analytics/wmf-product/jobs@master] Set up and document deployment strategy for jobs
Change 658578 had a related patch set uploaded (by Neil P. Quinn-WMF; owner: Neil P. Quinn-WMF):
[analytics/wmf-product/jobs@master] Improve documentation on deploying Oozie jobs
Change 658578 merged by Neil P. Quinn-WMF:
[analytics/wmf-product/jobs@master] Improve documentation on deploying Oozie jobs