This task is based on: T396495: Build model training pipeline for tone check using WMF ML Airflow instance.
Develop the code from exploratory-notebook and adjust it in order to be part of the ml-pipelines repo.
The decision is to keep the retraining code simple (for the current phase) without classes and abstractions, imitating the logic taken from the notebook.
The code needs to be adjusted in order to work with repo rules/pipelines and with both gitlab-ci/kokkuri.
The logic behind this attempt is to finalise the tone-check retraining code and develop it in such a way that it will be containerised via kokkuri.
The image needs to be slim and decoupled from the data and base_model.
The container will accept external volumes where the data and the base_model will exist.
The current status for Tone-Check retraining pipeline is summarised here in these comments: T396495#10970710 & T396495#11025158
This ticket is a continuation from the above status using the logic from the exploratory-notebook and simulates the actual operation decoupling the data/base_model from the retraining container.