Page MenuHomePhabricator

Airflow training pipeline
Closed, ResolvedPublic21 Estimated Story Points

Description

The goal is to develop building blocks for the airflow training pipeline. This task keeps track of the progress and updates related to this goal.

The research team has contributed to this goal, including:

  • The revert risk training workflow, implemented in this research-datasets branch.
  • The revert risk training dag.
  • An example notebook for training

Noted the training code uses the pyspark integration for xgboost, which is incompatible with AMD GPUs. As a result, the hadoop GPU is unused for now.

Things I will contribute to include:

  • adding a function in the evaluation step to compare the metrics of the newly trained model with the production model on Lift wing.
  • adding a train_bert_model in the training step that can utilize the hadoop GPU (now available in yarn's gpus queue, see this patch)
  • adding a component publish_model in the revert risk dag to publish the new model.

Event Timeline

achou renamed this task from Sprint: Airflow training pipeline to Airflow training pipeline.May 14 2024, 1:53 PM
achou updated the task description. (Show Details)
isarantopoulos set the point value for this task to 21.Jul 9 2024, 2:56 PM
isarantopoulos moved this task from Ready To Go to Blocked on the Machine-Learning-Team board.
isarantopoulos claimed this task.
isarantopoulos subscribed.

This task has been tackled as part of the retraining pipeline for tone check in T398970: Q1 FY2025-26 Goal: Airflow training pipeline for Tone check model