Page MenuHomePhabricator

[Airflow] Implement a NotebookOperator
Open, HighPublic1 Estimated Story Points

Description

Productionize the P.O.C. done in T322534.

Implement a NotebookOperator that runs Jupyter notebooks in Airflow using Skein and Papermill.
It should accept as parameters:

  • The path to the notebook file itself in HDFS (this can be treated as an Airflow artifact)
  • The path to a packaged conda environment in HDFS (.tar.gz file - also can be treated as an Airflow artifact)
  • Any parameters passed directly to the notebook via Papermill, these should be dynamic (you might want to pass i.e. {{ execution_date.year }} )

Event Timeline

EChetty set the point value for this task to 1.Jan 16 2023, 4:39 PM