Page MenuHomePhabricator

Support data sharing in complex networks of MediaWiki wikis
Open, Needs TriagePublic

Description

(This is currently meant as a brainstorming task, mainly to have a place to drop examples into as I encounter them... feel free to expand/refactor.)

There are various features in MediaWiki where data (whether content or metadata or configuration) is loaded from an upstream wiki. For example:

Most of these have been written with within-organization-centralization in mind, where one organization runs a wikifarm and wants to centralize some functionality on a single wiki to reduce maintenance burden for users. Some of them have since evolved towards a vision of global centralization (where Wikimedia provides some resource for the whole world - InstantCommons and Wikidata are the two main examples; some anti-abuse features such as spam blacklists are also used that way; and maybe global templates and global OAuth identities in the future). Some of them even support a network of connections (where wiki A gets the data from wiki B, but - possibly unbeknownst to A - B gets it from C), e.g. foreign image repos and Wikibase, but this was usually added as an afterthought and the support is poor.

Given our strategic vision of becoming the essential infrastructure of the ecosystem of free knowledge (so want a global knowledge ecosystem, not one silo per organization), while supporting knowledge equity (so people should not be forced to accept Wikimedia as the only possible central source of data), MediaWiki should have better support for networks of wikis sharing data with each other. The goal of task is to figure out what fundamental, platform-level functionality is missing or sub-par for that.

Some examples of problems we have currently when trying share data through multi-step connections:

  • shared file repositories: T190168
  • hard to track where imported content came from
  • hard to track who the authors of imported content are
  • ...TBD

Some fundamental deficiencies these point to:

  • Without some concept of a global wiki ID, it's hard to handle provenance or know where to fetch more information from. (T224020)
    • There is no such thing as a global "wiki identifier". The concept of wiki ID exists but only within a wikifarm.
    • There is no standard way to reference a wiki, even within a farm. Sometimes we use wiki IDs, sometimes domain names, sometimes interwiki names, but none of those are exactly equivalent to any other.
    • There is not always an easy way to get a global wiki ID attached to the data. Action API could easily be expanded to provide an identifier but sometimes we use action=render instead because of the API's fundamental uncacheability. (Or is that the reason? Is action=render cached?)
  • There is no global identifier for users. This is more complicated as the same user might be active on many wikis, users can be renamed, merged etc.