Page MenuHomePhabricator

Fix oozie banner_impression monthly job
Closed, ResolvedPublic1 Estimated Story Points

Description

The oozie banner-impression monthly coordinator has been failing for the past 10 instances.
The failures seem due to timeouts, due to the lack of of the 90 days postponing _SUCCESS file (the jobs needs 1 month of success flags for data to be present, and another one 90 days after the current month, to prevent sanitizing data too fast). The problem is that daily folders are deleted after 90 days, deleting in the meantime the _SUCCESS flags, therefore preventing the jobs to run. We need to find a new strategy here. The data being dropped is not an issue for druid as the job is re-indexing using already indexed data (segments).

Event Timeline

A manual fix has been applied to 2018 jobs.

Please also restart the job as user analytics, not hdfs (related patch is https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/508283/, if not deployed when you restart please use the -D user=analytics override and start the job via sudo -u analytics oozie etc..)

Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

ping @DStrine do you have a timeline to move data collection for this schema to the EventLogging pipeline? We are fixing this job this time, but it's a duplicate of our standard pipeline that we don't want to maintain going forward. cc @AndyRussG

JAllemandou set the point value for this task to 1.
JAllemandou moved this task from Next Up to In Code Review on the Analytics-Kanban board.

Change 508358 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Fix oozie banner monthly job

https://gerrit.wikimedia.org/r/508358

We will get back to event logging for CentralNotice soon but I don't have a timeline. We might have more information in a month or two.

Change 508358 merged by Joal:
[analytics/refinery@master] Fix oozie banner monthly job

https://gerrit.wikimedia.org/r/508358

Change 510773 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Fix XML namespace for SLA in banner_monthly job

https://gerrit.wikimedia.org/r/510773

Change 510773 merged by Joal:
[analytics/refinery@master] Fix XML namespace for SLA in banner_monthly job

https://gerrit.wikimedia.org/r/510773