This metric is split out to speed up information gathering ahead of reporting.
Top priority is to add this metric, test it locally, then run on dewiki:
- total number of sub-references across a wiki
Implementation
- Add a test fixture including a sub-ref: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/136
- Add the code to recognize sub-refs and count them: https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/137
- Finally, T410251: Run scraper on dewiki, to count number of sub-references
Current status
We've finished a dewiki scrape but the results need to be verified. The scraper detected 1,245 pages with subrefs but at the time of updating this task there are 5,023 pages in the category so either the analysis is failing or growth has been abrupt.
Nov 28 run outputs: https://analytics.wikimedia.org/published/datasets/one-off/html-dump-scraper-refs/2025-11-28/
dewiki details: https://analytics.wikimedia.org/published/datasets/one-off/html-dump-scraper-refs/2025-11-28/dewiki-summary.json
csv: https://analytics.wikimedia.org/published/datasets/one-off/html-dump-scraper-refs/2025-11-28/all-wikis-summary.csv
Number of pages on dewiki with subrefs (pages_with_subrefs_count): 1 245
Number of subrefs on dewiki (subrefs_sum): 14 548