Page MenuHomePhabricator

Modify pipelines to leverage Spark 3.3 Shuffler
Closed, ResolvedPublic2 Estimated Story Points

Description

On T344910, we deployed additional Spark Shufflers to our cluster so that we can support Spark 3.3 and Spark 3.4 lines.

We currently use Spark 3.3 on the Dumps 2.0 pipelines.

In this task we should update the jobs so that they leverage the Spark 3.3 shuffler.

  • All Dump 2.0 pipelines run with the Spark Shuffler.

Details

TitleReferenceAuthorSource BranchDest Branch
Use Spark 3.3 Shuffler for Dumps 2.0 pipelines.repos/data-engineering/airflow-dags!556xcollazoT352890-use-spark-33-shufflermain
Customize query in GitLab

Event Timeline

xcollazo set the point value for this task to 2.Dec 6 2023, 4:54 PM

Mentioned in SAL (#wikimedia-analytics) [2023-12-07T21:45:30Z] <xcollazo> Deployed latest changes to Airflow Analytics instance to pickup T352890

Deployed to production, working as expected.