Write a scraper interface which takes as its input a wiki database name like "hawiki" and scrapes the dumps pages https://dumps.wikimedia.org/other/enterprise_html/ to find the most recent NS0 dump for that wiki. Output should be an URL to the dump tarball.
Write a script which simply downloads that tarball to the local drive. Include a flag for streaming only a sample (eg. first 10k lines) without downloading the entire file.
Code to review: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/5