Page MenuHomePhabricator

Move populateSitesTable script out of Wikibase.
Open, MediumPublic

Description

Some functionality related to the sites table still resides in the Wikibase extension.

Most importantly, the populateSitesTable script reads the site matrix from meta, and imports it into the local sites table. The script should probably be renamed, e.g. to loadSiteMatrixIntoSitesTable, or some such.

The helper classes SitesBuilder and SiteMatrixParser should also be moved into core.

Event Timeline

daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a project: MediaWiki-Site-system.
daniel subscribed.

this script is somewhat Wikimedia-specific which is the reason I think we (at least myself) have hesitated to move this to core.

I also haven't been able to come up with a better place (thought about WikimediaMaintenance, etc.).

I think the best solution is a small, new extension that can contain this stuff. It can probably be enabled by default for all Wikimedia wikis.

I have experimented with pulling this stuff out of Wikibase (and small things like updating the script to get SiteStore from MediaWikiServices):

https://github.com/filbertkm/WikimediaSites

If this is an agreeable solution, then I can request a gerrit repository and can submit the code there.

PS. when trying to run populateSitesTable.php in Wikibase, it was broken for me.

The required sites classes have been moved from DIR . '/../includes/sites/SitesBuilder.php' to DIR . '/../includes/Sites/SitesBuilder.php';

also the way the check is done doesn't entirely work as intended. Wikibase is never loaded (yet) at this point in script so we always require these files, even if a wiki has Wikibase. (probably no harm done)

This code was a hack to allow running the script on wikis that don't have Wikibase installed or in the case that we are enabling Wikibase on new wikis (e.g. we prefer to populate the sites first before enabling)

suppose I could still fix the script, but am getting tired of adding yet another "@todo we should really move this script to core" (and I'm not convinced core is the right place).

aude renamed this task from Move populateSitesTable script into core. to Move populateSitesTable script out of Wikibase..Apr 28 2016, 9:39 PM
aude added a project: Wikidata.

It seems if we say populateSitesTable is Wikimedia specific, we are saying that Wikibase is Wikimedia specific? AFAIK there is no way to set up a Wikibase installation with sitelinks without somehow specifying the information that may be in the sites table.

How does the information get into the SiteMatrix? The list of wikis comes from dblists in mediawiki-config, which lists is a configuration of SiteMatrix. Where does their language and group comes from? Perhaps the db specific configuration in mediawiki-config? How does the extension access that part of the configuration, isn't it only accessible when in scope of one wiki and the only for that wiki? Does SiteMatrix depend on how certain details in mediawiki-config and the Mediawiki multi version stuff work?

Maybe moving populateSitesTable.php into an extension named WikimediaSites will not improve the state of things. core might be the right location for an PHP API that can give out that information, with an index built at some point before the deployment is finished. Where should the code to build that index live? I don't know all the involved parts well enough nor do I have a useful overview. Perhaps a full list of sources of information and where they are needed will help.

@JanZerebecki populateSitesTable.php only depends on having a source api that has sitematrix. Though there is some special handling code that assumes if a site id (dbname) has a 'wiki' suffix then it is in the "wikipedia" site group.

theoretically sitematrix can exists outside of Wikimedia (though it has settings like $wgSiteMatrixFishbowlSites) and the script can use any api as a source that has sitematrix.

alternative ways to populate sites include the importSites.php script in core, which takes an xml file. The xml file has to be crafted by hand. Also, I think this importer missing importing some aspects of the sites. Though in the future, maybe we want to handle sites with a configuration file(s) like this.

or I think some people just directly insert entries into the sites table on their wiki.

none of these solutions is very nice and think @daniel has an RFC T113034 about more broadly improving the sites system.

For now, having populateSitesTable out of Wikibase is an improvement and it's at least my opinion that core is not a good place for it. (open to other suggestions) I would rather not continue to block moving the script on some grand refactorings and having a nice new system.