Page MenuHomePhabricator

Dump interwiki table
Closed, InvalidPublicFeature

Description

Feature summary Write the interwiki table into wiki dumps, for example as enwiki-20240501-interwiki.sql.gz.

Use case(s) The current SQL database dumps already contain the iwlinks table, which is great. But to interpret iwlinks, one needs to know (and resolve) prefixes that are specific to each wiki. These prefixes are kept in the interwiki table, see documentation. However, as of May 2024, the interwiki table does not seem to get written to dumps.

Benefits: Allow reconstruction of the link graph between wikis. For example, in the QRank project we’d like to eventually run the PageRank algorithm on the link graph, including inter-wiki links. Now, it’s already possible to get the contents of the interwiki table by sending an API call to the live sites. But it would be nicer to get this data from dumps, so that the pipeline doesn’t have to hit the production wikis.

Event Timeline

Pppery subscribed.

Wikimedia production does not use the interwiki table. It uses a static interwiki cache at https://noc.wikimedia.org/conf/interwiki.php.txt per https://www.mediawiki.org/wiki/Manual:Interwiki_cache