I've found the need to run in this mode several times now, and each time I do a one-off patch. It's not entirely trivial because the automatic wiki discovery code needs to be either bypassed or retargeted to look for existing page summary files.
Example use cases:
- Implemented deduplication, want to reprocess files with this in place.
- Added a new aggregate column which can be calculated from the old page summaries.
Schema changes would also be a concern here, but maybe aggregations can be made robust to missing input columns.
Code to review: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/93