Goal:
Do the required Java prep work to migrate the webrequest load jobs to Airflow
Job Details:
Input | Processing | Output |
Raw JSON | Hive | Hive + Table Tests |
Success Criteria:
- Have the 2 Jobs Migrated (SLA 5 Hours)
Gotchas
- This job includes archiving of results. Maybe we need to adapt the existing Airflow custom ArchiveOperator to match this job's format.
- Job needs to be rewritten - TBD how.
Gerrit organisation:
- 1 merge request about replacing Guava cache by Caffeine and extracting Guava
- 1 merge request about making existing code in UDFs thread compatible (remove singletons + function serialization)