==== **Problem Statement**
> ==== Right now, only analysts who are comfortable with data engineering practices are given limited ability to schedule jobs. We need a system to facilitate the automated scheduling of jobs that is appropriately accessible for analysts.
==== Spike Outcomes:
[] How can we schedule notebooks in airflow
[x] Write a simple NotebookOperator
[x] Build a one-off Conda env that runs Jupyter notebooks and papermill
[x] Write a test DAG that runs a notebook in Airflow
[] Test it works
[x] Investigate PAs Jobs/Notebooks intended to be scheduled
[x] What data will the notebooks need to access?
[x] Which engines (Hive, Spark, R, others?) will the notebooks need to run?
[x] What types of outputs do the notebooks produce (Hive, reports, dashboards?)
==== Maybe part of this spike? Or maybe should be a separate task.
[] Ownership Map -> What we will own vs What others will own
[] Show idea (NotebookOperator in Airflow DAG, using single conda env automatically packaged by CI) to Product Analytics and ask if they would like that.
[] Discuss who would take care of writing DAGs, testing DAGs, reviewing DAG code, merging, deploying Airflow, receiving alerts, troubleshooting failed DAGs, updating the notebooks conda env with new libraries, etc.