This task is done when the scraper CI produces standalone binary packaging that can be run on generic Debian bullseye and the Analytics cluster.
- Investigate Conda packaging
- Conclusion: This is wrong for Elixir, our https://gitlab.wikimedia.org/repos/data-engineering/workflow_utils#artifact-config-files workflow-utils conda-dist is specific to Python project packaging.
- Abandoned: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/165
- Prepare scraper to run under Airflow
- Wire mix release with MIX_ENV=prod
- Add a release release CI build step in the scraper's .gitlab-ci.yml to produce the standalone archive.
- Verify the package runs on stat1010 (assumed to be similar enough to the Airflow runners).
- Link the packaged file as an Airflow artifact and wire into a SimpleSkeinOperator task.
See this similar release process for Python projects: https://wikitech.wikimedia.org/wiki/Data_Platform/Systems/Airflow/Developer_guide/Python_Job_Repos#Deploying_your_conda_env_artifact_for_use_by_Airflow