**Proposal updated 2016-05-11**, see below for the original RFC
Status
-------
- Please review: //factor storage logic out of Interwiki// <https://gerrit.wikimedia.org/r/#/c/250150/> (I7d7424345)
Next Steps
------------
- split CDB from SQL implementation
- implement array-based InterwikiLookup (loads from multiple JSON or PHP files)
- indexes should be generated on the fly, if not present in the loaded data
- proposed structure: P3044
- that InterwikiLookup implementation should also implement SiteLookup. Alternatively, only implement SiteLookup, and provide an adapter (SiteLookupInterwikiLookup) that implements InterwikiLookup on top of a SiteLookup.
- implement maintenance script that can convert between different interwiki representations.
- use InterwikiLookup for (multipke) input sources (db/files), InterwikiStore for output
- we want an InterwikiStore that can write the new array structure (as JSON or PHP)
- we want an InterwikiStore that can write the old CDB structure (as CDB or PHP)
- Provide a config variable for specifying which files to read interwiki info from. If not set, use old settings and old interwiki storage.
Questions
-----------
- is this a good plan? (see below for rationale)
- how does interwiki/site info relate to local wiki config (wgConf/SiteMatrix/WikiMap)?
- should all information always be loaded? (see also {T114772})
- do we need caching?
- do we need to support new features also for the SQL based InterwikiLookup?
- needs: interwiki_ids table, interwiki_groups table, and blob field with JSON or an interwiki_props table.
- Should SiteMatrix continue to work based on wgConf, or should it be ported to use Sites? Or combine both? Currently it has [[https://gerrit.wikimedia.org/r/#/c/211119/|problems]] with Wikimedia-specific configurations, e.g. for [[https://meta.wikimedia.org/wiki/Special_language_codes|special language codes]].
Later
-------
- decide on how wikis on the WMF cluster should load their interwiki config
- proposal: three files: family (shared by e.g. all wikipedias), language (shared by e.g. all english wikis), and local.
- create a script that generates the family, language, and local files for all the wikis (as JSON or PHP) based on config. Should work like dumpInterwiki.
- check this: generating CDB based on the relevant family/language/local file for a given wiki should return the same CDB as dumpInterwiki for that site.
- create a deployment process that generates PHP files from the checked-in JSON files, for faster loading.
- action=siteinfo&siprop=interwikimap could be ported to Sites and expose more information. Distinction from SiteMatrix is becoming somewhat unclear then.
_________
**Original RFC**
================
We currently have three systems in core that provide information about other sites: Interwiki, WikiMap, and SiteStore. The information they provide is frequently inconsistent (between each other as well as between wikis), and none of them provides a good interface for maitaining the information. This RFC proposes a path to fix this.
Historically, Interwiki was used for linking to other wikis from wikitext, while WikiMap helps with linking to other wikis programmatically. Sites/SiteStore/SiteLookup was introduced to allow access to other wiki's APIs, and was intended to replace the old interwiki system.
This proposal builds on the idea that information about other sites is configuration, not content. There is no need to have it in the database at all, or in any way mutable by the application. This proposal assumes that reading (and caching) local files is faster than loading from a database server (or memcached).
Objectives
---------
- allow us to use the more flexible Sites system instead of the crusty Interwiki system
- allow us to use Sites and Interwiki side by side, based on the same data
- allow interwiki mappings / site definitions to be maintained in files, not in the database. This is easier to maintain via git and puppet (or vim).
- Preserve the legacy interface for interwiki links (static methods in Interwiki)
- make Sites (at least as) performant as the current multi-cache hodge-podge implementation of Interwiki.
- Make WikiMap consistent with Interwiki and SiteLookup
Requirements
---------
- Keep dumpInterwiki.php working so that Wikimedia wikis' interwiki map keeps being managed on Meta-Wiki. https://meta.wikimedia.org/wiki/Interwiki_map
- Keep Special:Interwiki working both in read-only and in read/write mode (https://www.mediawiki.org/wiki/Extension:Interwiki ).
Outline
---------
Refactor:
- Create a new interface InterwikiLookup with all the public methods from Interwiki (see I7d7424345)
- Create ClassicInterwikiLookup (//better name needed//) implements InterwikiLookup; implement it using the code currently in Interwiki. "classic" because it'S basically the old code, and implements the old storage backends (sql, cdb, ...). (see I7d7424345)
- Make the public static methods in Interwiki delegate to a singleton instance of InterwikiLookup, remove everything else from the class. (see I7d7424345)
- Add missing Interwiki concepts to the Site class, e.g. the "local" flag ("local" could be implemented as a group)
- Allow sites to be a member of multiple groups (e.g. "wikipedia" and "english" for enwiki).
- ~~re-implement DBSiteStore without dependency on ORMTable.~~ (done in I7e7ca257)
- Reduce the complexity of Sites & co: remove SiteObject and SiteSQLStore; Consider dropping SiteList in favor of a more powerful SiteLookup interface.
Migrate to Sites:
- Create an adapter, SiteLookupInterwikiLookup, implementing InterwikiLookup based on a SiteLookup.
- Migrate usages of SiteStore to SiteLookup.
- Provide a script for importing information from an InterwikiLookup into the sites table. Can be used for migrating from interwiki in the database or CDB (as generated by dumpInterwiki.php) to sites in the database.
- Switch the singleton used by the static methods in Interwiki to use SiteLookupInterwikiLookup instead of ClassicInterwikiLookup (should be configurable)
- ~~Map WikiMap look up wiki info in a SiteLookup before (after?) checking in $wgConf (optional?).~~ (done in I8186140ae)
File base backend:
- FileSiteLookup implements a SiteLookup that will simply load site definitions from a list of local files
- support at least JSON (easy to maintain) and PHP (code, not serialized data; fast with accelerator cache). Go by file extension.
- Make an export script that can generate JSON site definition files:
- Export from a SiteLookup or InterwikiLookup (needs an adapter that implements a SiteLookup based on an InterwikiLookup)
- Export all, or a list of groups
- Export only the ones that differ from the ones defined in a list of given files. This can be used to generate files that contain only the local overrides / additions to a common list of site definitions.
- Provide a script for writing information from an InterwikiLookup to a JSON file. This can be used to port the output of dumpInterwiki.php to JSON.
- Make a maintenance script that generates a PHP file with site definitions from a list of JSON (and PHP) files.
- the generated PHP file would contain indexes for all IDs and groups as well as the main data structure.
- Switch the default implementation of SiteLookup from DBSiteStore to FileSiteLookup.
Performance:
- Add methods for fetching Site objects for a group to SiteLookup
- Make CachingSiteStore (resp CachingSiteLookup) cache individual groups. Use "siblings" group for sister projects of the same language.
Configuration
----------------
- InterwikiLookup implementation to use. Default: SiteLookupInterwikiLookup
- For ClassicInterwikiLookup, use the old interwiki settings controlling CDB usage etc.
- SiteLookup implementation to use. Default: FileSiteLookup
- per default, read DefaultInterwiki.json (maintained in git) and LocalInterwiki.json (shipped empty).
- on the WMF cluster, each wiki uses three JSON files: a common file, one per family, and one for local overrides per wiki.
- on the WMF cluster, use PHP files for speed. The PHP file could be generated per-wiki, combining the common, family, and local JSON files. This essentially replaces the functionaly of dumpInterwiki.php.
- Caching:
- which cache (possibly none for PHP files)
- duration
- groups to cache separately (all?)
Open Questions
----------------
- should Site objects always be fully loaded/instantiated? Or would it be better to be able to ask for individual "aspects" of a site, e.g. pathes, dbname, ids, etc?
- see also {T114772}
- should Site objects relay information from wikifarm configuration (wgConf)? Or should Sites be kept entirely separate from configuration? WikiMap already combines information from these two sources. But the old interwiki map is compeltely separate from wgConf.
- Should SiteMatrix continue to work based on wgConf, or should it be ported to use Sites? Or combine both? Currently it has [[https://gerrit.wikimedia.org/r/#/c/211119/|problems]] with Wikimedia-specific configurations, e.g. for [[https://meta.wikimedia.org/wiki/Special_language_codes|special language codes]].
- should the JSON structure for describing sites have a narrow specification, or be flexible towards additions?
- action=siteinfo&siprop=interwikimap could be ported to Sites and expose more information. Distinction from SiteMatrix is becoming somewhat unclear then.