The current section topics logic joins blue links with wmf.wikidata_item_page_link on the title to look Wikidata QIDs up. This boils down to a lowercased perfect string match.
The approach is not robust, infringes the expected wikilink markup, and leads to:
- null topics for category links, part of T323523: [L] Sanitize blue links that yield null topics
- multiple topics for one link due to Wikidata homonyms, e.g., Q821067 and Q2386274 for Virtual Console
- T318431: [M] Handle redirects
Use the wmf_raw.mediawiki_page_props to lookup the wikibase_item value of a given blue link page instead.
The coverage of that page property looks reasonable:
- P42719 contains the coverage for every Wikipedia
- 220 of 305 Wikipedias have a coverage >= 0.98
- 32 of 305 Wikipedias have a coverage < 0.9, see P42720.
I ran a test implementation:
- it requires a radical change in the code base
- it has an impact on the overall design, such as relying on a monthly snapshot instead of weekly
- it doesn’t give the expected benefits. Neither revision IDs nor page titles are available in wmf_raw.mediawiki_page_props, so we need an additional join either to look page titles up or to look page IDs up from extracted blue links
Closing as invalid.