Page MenuHomePhabricator

When moving oozie webrequest-load to airflow/spark avoid the error-check corner case
Closed, ResolvedPublic3 Estimated Story Points

Description

When all varnishkafka instances are restarted during the same calendar-hour, the webrequest_sequence_stats_hourly table is empty as we filter out rows having their sequence_min equals to 0.
This makes the dataloss checks fail (both ERROR and WARNING) because no file is generated by Hive when the query has no input data.
I suggest mitigating this by checking for the file existence before checking for its size here and here.

When we move the job to airflow and spark we wish to avoid that corner case.

Event Timeline

JAllemandou renamed this task from Fix oozie webrequest-load error-check corner case to When moving oozie webrequest-load to airflow/spark avoid the error-check corner case.Dec 8 2022, 6:09 PM
JAllemandou updated the task description. (Show Details)
lbowmaker claimed this task.
lbowmaker subscribed.

Resolving, job has been migrated to Airflow now.