Page MenuHomePhabricator

Migrate Hive queries to Spark
Open, MediumPublic

Description

In an effort to reduce the number of execution engines we use, we are pushing toward using Spark instead of Hive on the cluster.
Could you please move your airflow run Hive queries to Saprk at some point ?
As far as I have seen using this gitlab query, you mostly useHive to create/update tables, and in one DAG (query_clicks) you run 2 queries. I think the change to migrate to spark should not be difficult :)
Many thanks :)

Event Timeline

Gehel triaged this task as Medium priority.Jun 8 2023, 2:36 PM
Gehel moved this task from needs triage to ML & Data Pipeline on the Discovery-Search board.