Page MenuHomePhabricator

RFC: Overhaul Interwiki map, unify with Sites and WikiMap
Open, HighPublic


Proposal updated 2016-05-11, see below for the original RFC


Next Steps

  • split CDB from SQL implementation
  • implement array-based InterwikiLookup (loads from multiple JSON or PHP files)
    • indexes should be generated on the fly, if not present in the loaded data
    • proposed structure: P3044
    • that InterwikiLookup implementation should also implement SiteLookup. Alternatively, only implement SiteLookup, and provide an adapter (SiteLookupInterwikiLookup) that implements InterwikiLookup on top of a SiteLookup.
  • implement maintenance script that can convert between different interwiki representations.
    • use InterwikiLookup for (multipke) input sources (db/files), InterwikiStore for output
    • we want an InterwikiStore that can write the new array structure (as JSON or PHP)
    • we want an InterwikiStore that can write the old CDB structure (as CDB or PHP)
  • Provide a config variable for specifying which files to read interwiki info from. If not set, use old settings and old interwiki storage.



  • decide on how wikis on the WMF cluster should load their interwiki config
    • proposal: three files: family (shared by e.g. all wikipedias), language (shared by e.g. all english wikis), and local.
  • create a script that generates the family, language, and local files for all the wikis (as JSON or PHP) based on config. Should work like dumpInterwiki.
    • check this: generating CDB based on the relevant family/language/local file for a given wiki should return the same CDB as dumpInterwiki for that site.
  • create a deployment process that generates PHP files from the checked-in JSON files, for faster loading.
  • action=siteinfo&siprop=interwikimap could be ported to Sites and expose more information. Distinction from SiteMatrix is becoming somewhat unclear then.

Original RFC

We currently have three systems in core that provide information about other sites: Interwiki, WikiMap, and SiteStore. The information they provide is frequently inconsistent (between each other as well as between wikis), and none of them provides a good interface for maitaining the information. This RFC proposes a path to fix this.

Historically, Interwiki was used for linking to other wikis from wikitext, while WikiMap helps with linking to other wikis programmatically. Sites/SiteStore/SiteLookup was introduced to allow access to other wiki's APIs, and was intended to replace the old interwiki system.

This proposal builds on the idea that information about other sites is configuration, not content. There is no need to have it in the database at all, or in any way mutable by the application. This proposal assumes that reading (and caching) local files is faster than loading from a database server (or memcached).


  • allow us to use the more flexible Sites system instead of the crusty Interwiki system
  • allow us to use Sites and Interwiki side by side, based on the same data
  • allow interwiki mappings / site definitions to be maintained in files, not in the database. This is easier to maintain via git and puppet (or vim).
  • Preserve the legacy interface for interwiki links (static methods in Interwiki)
  • make Sites (at least as) performant as the current multi-cache hodge-podge implementation of Interwiki.
  • Make WikiMap consistent with Interwiki and SiteLookup




  • Create a new interface InterwikiLookup with all the public methods from Interwiki (see I7d7424345)
  • Create ClassicInterwikiLookup (better name needed) implements InterwikiLookup; implement it using the code currently in Interwiki. "classic" because it'S basically the old code, and implements the old storage backends (sql, cdb, ...). (see I7d7424345)
  • Make the public static methods in Interwiki delegate to a singleton instance of InterwikiLookup, remove everything else from the class. (see I7d7424345)
  • Add missing Interwiki concepts to the Site class, e.g. the "local" flag ("local" could be implemented as a group)
    • Allow sites to be a member of multiple groups (e.g. "wikipedia" and "english" for enwiki).
  • re-implement DBSiteStore without dependency on ORMTable. (done in I7e7ca257)
  • Reduce the complexity of Sites & co: remove SiteObject and SiteSQLStore; Consider dropping SiteList in favor of a more powerful SiteLookup interface.

Migrate to Sites:

  • Create an adapter, SiteLookupInterwikiLookup, implementing InterwikiLookup based on a SiteLookup.
  • Migrate usages of SiteStore to SiteLookup.
  • Provide a script for importing information from an InterwikiLookup into the sites table. Can be used for migrating from interwiki in the database or CDB (as generated by dumpInterwiki.php) to sites in the database.
  • Switch the singleton used by the static methods in Interwiki to use SiteLookupInterwikiLookup instead of ClassicInterwikiLookup (should be configurable)
  • Map WikiMap look up wiki info in a SiteLookup before (after?) checking in $wgConf (optional?). (done in I8186140ae)

File base backend:

  • FileSiteLookup implements a SiteLookup that will simply load site definitions from a list of local files
    • support at least JSON (easy to maintain) and PHP (code, not serialized data; fast with accelerator cache). Go by file extension.
  • Make an export script that can generate JSON site definition files:
    • Export from a SiteLookup or InterwikiLookup (needs an adapter that implements a SiteLookup based on an InterwikiLookup)
    • Export all, or a list of groups
    • Export only the ones that differ from the ones defined in a list of given files. This can be used to generate files that contain only the local overrides / additions to a common list of site definitions.
  • Provide a script for writing information from an InterwikiLookup to a JSON file. This can be used to port the output of dumpInterwiki.php to JSON.
  • Make a maintenance script that generates a PHP file with site definitions from a list of JSON (and PHP) files.
    • the generated PHP file would contain indexes for all IDs and groups as well as the main data structure.
  • Switch the default implementation of SiteLookup from DBSiteStore to FileSiteLookup.


  • Add methods for fetching Site objects for a group to SiteLookup
  • Make CachingSiteStore (resp CachingSiteLookup) cache individual groups. Use "siblings" group for sister projects of the same language.


  • InterwikiLookup implementation to use. Default: SiteLookupInterwikiLookup
    • For ClassicInterwikiLookup, use the old interwiki settings controlling CDB usage etc.
  • SiteLookup implementation to use. Default: FileSiteLookup
    • per default, read DefaultInterwiki.json (maintained in git) and LocalInterwiki.json (shipped empty).
    • on the WMF cluster, each wiki uses three JSON files: a common file, one per family, and one for local overrides per wiki.
    • on the WMF cluster, use PHP files for speed. The PHP file could be generated per-wiki, combining the common, family, and local JSON files. This essentially replaces the functionaly of dumpInterwiki.php.
  • Caching:
    • which cache (possibly none for PHP files)
    • duration
    • groups to cache separately (all?)

Open Questions

  • should Site objects always be fully loaded/instantiated? Or would it be better to be able to ask for individual "aspects" of a site, e.g. pathes, dbname, ids, etc?
  • should Site objects relay information from wikifarm configuration (wgConf)? Or should Sites be kept entirely separate from configuration? WikiMap already combines information from these two sources. But the old interwiki map is compeltely separate from wgConf.
  • Should SiteMatrix continue to work based on wgConf, or should it be ported to use Sites? Or combine both? Currently it has problems with Wikimedia-specific configurations, e.g. for special language codes.
  • should the JSON structure for describing sites have a narrow specification, or be flexible towards additions?
  • action=siteinfo&siprop=interwikimap could be ported to Sites and expose more information. Distinction from SiteMatrix is becoming somewhat unclear then.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Nemo_bis updated the task description. (Show Details)May 9 2016, 7:05 AM
daniel added a comment.May 9 2016, 9:06 PM

Does this mean that the interwiki map at Meta will be disbanded?

No. The public interfaces will stay around. Perhaps they will extended or slightly modified, but I at least have no plan to get rid of them.

One major question we have to answer at some point though is how meta-information about other wikis on the same cluster (their domain, api path, interwiki prefix, etc) relates to the configuration of those wikis. AS far as I know, the SiteMatrix is currently build from looking at the configuration of all the wikis on the Wikimedia cluster, not from information in the interwiki system. I'm not clear yet on if and how the two should be integrated with each other.

But how ever that may play out, a site matrix will be available on meta.

daniel added a comment.May 9 2016, 9:14 PM

As a beginner who wants to understand this better, I went looking for examples of each. I found these links. Please add whatever is accurate/useful, to the task description! Thanks.

It's more about the code than Special pages. The first thing to be refactored would be

SiteMatrix also provides an API that is relevant:

(I'm not sure about these 2...)

Yes, that thing, but more relevantly, it's DB based implementation:

No, this here:

daniel updated the task description. (Show Details)May 11 2016, 5:32 PM
RobLa-WMF lowered the priority of this task from Medium to Low.May 11 2016, 11:04 PM
RobLa-WMF moved this task from Request IRC meeting to Under discussion on the TechCom-RFC board.
RobLa-WMF raised the priority of this task from Low to Medium.May 11 2016, 11:33 PM

We discussed this in E171 today. Full notes are E171#2016

The summary:

  • question discussed: which backends should InterwikiLookup support? (robla, 21:10:54)
  • i imagine every wiki would read three files actually (and perform a deep merge): one with info shared across the family, one with info shared accross the laanguage, and one with local overrides for the specific wiki (DanielK_WMDE, 21:22:54)
  • aude: also can interwiki ids be renamed? daniel: you can add prefixes. (DanielK_WMDE, 21:23:42)
  • an entry can have multiple global ids. they act as aliases. only one of them would be used as a key in the file, makign it the *canonical* global id. (DanielK_WMDE, 21:24:05)
  • <aude> another thing we should have is configuration for sorting order of interwiki ids (maintained in a sane place) (DanielK_WMDE, 21:33:00)
  • LINK: (aude, 21:33:23)
  • LINK: (aude, 21:33:26)
  • <TimStarling> anyway, yes, the JSON format you propose looks very extensible and will presumably meet our needs (DanielK_WMDE, 21:33:39)
  • LINK: (DanielK_WMDE, 21:34:21)
  • <TimStarling> I don't want to have m:Interwiki_map anymore (DanielK_WMDE, 21:36:27)
  • Tim is not convinced that interwiki info should be maintained by hand as json. Perhaps we still want dumpInterwiki (or equivaloent) (DanielK_WMDE, 21:50:15)
  • Tim thinks we need to figure out what information can be taken from wgConf, and what should come from elsewhere, and how to maintain it. But it's not a blocker for now, we can figure iot out later (DanielK_WMDE, 21:53:55)
  • next week's meeting: E184 RFC: Requirements for change propagation (T102476) (robla, 21:57:21)
  • Tim thinks it's ok to go ahead with implementing the proposed next steps, as they are non-threatening. But should we have a formal last call? (DanielK_WMDE, 22:02:40)

We agreed that there's no reason to go to last call, because we weren't making a final decision.

@daniel, my understanding from our previous conversations (and discussions about a new column on the TechCom-RFC board) is that this is "on track". As of right now, you're not waiting on TechCom for approval before continuing development (or understanding what next steps should be) and that everyone (including TechCom) is happy that implementation is underway. Is that a fair characterization?

(I'm asking because I was recently asked about status on this RFC)

daniel moved this task from proposed to tracking on the WMDE-TLA-Team board.
daniel added a comment.Nov 1 2016, 5:07 PM

I have proposed T149535: Refactoring the Interwiki Map: status and outlook for the developer summit in January. If you are interested in such a session, please comment on the ticket.

cscott added a subscriber: cscott.Nov 4 2016, 8:33 PM
dcausse added a subscriber: dcausse.
dcausse updated the task description. (Show Details)Nov 15 2016, 4:25 PM
dcausse updated the task description. (Show Details)Nov 18 2016, 2:11 PM
daniel moved this task from Inbox to Project on the User-Daniel board.Jan 5 2017, 7:03 PM
Koavf added a subscriber: Koavf.Mar 14 2017, 10:50 PM
daniel edited projects, added TechCom-RFC (TechCom-RFC-Closed); removed TechCom-RFC.
daniel moved this task from TechCom-RFC-Closed to In progress on the TechCom-RFC board.
daniel edited projects, added TechCom-RFC; removed TechCom-RFC (TechCom-RFC-Closed).
Restricted Application added a subscriber: PokestarFan. · View Herald TranscriptJul 26 2017, 7:13 PM
kchapman moved this task from Under discussion to Old on the TechCom-RFC board.Mar 9 2018, 8:44 PM
kchapman added a subscriber: kchapman.

Does not currently have anyone stepping up to implement, moving to the backlog for now.

Actually, work on SiteLookup and Interwiki is now in the Wikidata backlog. I discussed this with @Ladsgroup and @Lydia_Pintscher last week. The RFC still needs an update, so backlog is appropriate.

Addshore added a subscriber: Addshore.

Not sure if this belongs on the campsite board yet, so removing for now.
This is still on the Wikidata radar of course, but this ticket needs more before i can actually be worked on (task breakdown & what not)

Aklapper removed daniel as the assignee of this task.Jun 19 2020, 4:18 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Krinkle moved this task from Old to P1: Define on the TechCom-RFC board.Sep 16 2020, 8:12 PM
Aklapper removed a subscriber: Anomie.Fri, Oct 16, 5:01 PM