Page MenuHomePhabricator

Use Wikipedia page properties instead of Wikidata page links
Closed, InvalidPublic

Description

The current section topics logic joins blue links with wmf.wikidata_item_page_link on the title to look Wikidata QIDs up. This boils down to a lowercased perfect string match.
The approach is not robust, infringes the expected wikilink markup, and leads to:

Use the wmf_raw.mediawiki_page_props to lookup the wikibase_item value of a given blue link page instead.
The coverage of that page property looks reasonable:

  • P42719 contains the coverage for every Wikipedia
  • 220 of 305 Wikipedias have a coverage >= 0.98
  • 32 of 305 Wikipedias have a coverage < 0.9, see P42720.

Update

I ran a test implementation:

  • it requires a radical change in the code base
  • it has an impact on the overall design, such as relying on a monthly snapshot instead of weekly
  • it doesn’t give the expected benefits. Neither revision IDs nor page titles are available in wmf_raw.mediawiki_page_props, so we need an additional join either to look page titles up or to look page IDs up from extracted blue links

Closing as invalid.