Our use case is to work intensively on a single snapshot (eg. 20230401, https://dumps.wikimedia.org/other/enterprise_html/runs/20230401/ ) and process it completely. Even if a new snapshot appears, we don't want to switch over to that because mixing snapshots would be inconsistent. When we run this tool again in the future, we'll be using a new snapshot date.
This is a good fit for application configuration, introducing a new key in config/prod.exs . The value should be read in pipeline.ex where filenames are constructed, and in dumps_mirror.ex overriding the latest snapshot discovery logic.
Code to review:
https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/40