Write a simple aggregator which takes a summarized output from parse_wiki.exs and produces some simple statistics. We will refine these later, the point here is just to set up an initial framework that we can build on.
Suggested aggregations:
- Average ref_count per page.
- Average transclusion_count per page.
- Average ref_by_transclusion per page.
- Union of all unique potential_ref_transclusions.
Output format could be a new CSV file, or a formatted report. - Decided to go with one JSON line for each wiki so we can combine them later.
Review: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/7