During the work to T362788: Migrate Airflow to the dse-k8s cluster we discovered that a number of tasks that are currently using the BashOperator are attempting to run some scripts that are part of analytics/refinery.
These tasks do not work in the Kubernetes environment, because refinery is only deployed to certain target hosts (1, 2, 3) and also to HDFS.
One approach that we may wish to use is to create a refinery job artifact using the conda and WMF Workflow Utils based approach.
However, another approach that is available to us is to create a refinery container image. This will contain all of the scripts and libraries required, as well as the underlying CLI utilities such as hdfs, hive, yarn, mysql etc.
We will then be able to launch this using the KubernetesPodOperator from within a DAG.