Page MenuHomePhabricator

Register namespaces in the database
Open, Needs TriagePublic

Description

Problem
In T200938 it was discovered that there was a need to determine what the canonical namespace is for a given namespace index from a foreign wiki or tool (something that has access to the database, but not the config). However, there is no way to do that.

Likewise, there is no way to ensure that a namespace id doesn't represent two different namespaces on different wikis.

It is impossible to get a full name (and url?) of a page without using the API.

Solution
Register namespaces in the database with a namespace table. This table would be an id and a canonical name. This would give something for page.page_namespace to relate to in queries.

Event Timeline

there was a need to determine what the canonical namespace is for a given namespace index. However, there is no way to do that.

Is this missing "on a foreign wiki" in the problem description somewhere? There obviously is a way to do this for the wiki that you're currently on; it seems that the problems arise when you're looking at another wiki's DB from cross-wiki extensions like CentralAuth.

Also, it seems to me there's a difference between core namespaces, namespaces registered by extensions, and namespaces set in config using $wgExtraNamespaces (see also the table at T226657#5288181). The English canonical name always works for core namespaces (-2 through 15), so you can get a full name and URL that work for accessing the page based on just the namespace ID and the title (although that URL might be a redirect to the canonical URL if the wiki's content language is not English), and you can get the canonical name+URL if you also know the target wiki's content language (except for NS_PROJECT and NS_PROJECT_TALK, which are special). For extension namespaces it pretty much works the same way, except that not all extensions are installed on all wikis, but if you see a page in an extension's namespace in a foreign wiki's database, it's probably installed there. The big problem is extra namespaces. There are some conventions around this, and the "Portal" namespace has the same ID on most of the wikis where it's present, but we definitely have the same namespace ID pointing to different namespaces on different wikis, and no good infrastructure for resolving a title from another wiki.

However, I don't think a database table is necessary as a solution to this problem. It may be a good idea for other reasons, but to address the particular problem of not knowing how to format titles on another wiki, we could add functionality to SiteConfiguration, WikiMap and/or NamespaceInfo that lets you grab the config from another wiki and resolve namespace names using that config. Maybe this could look like instantiating a separate NamespaceInfo instance and MediaWikiTitleCodec instance seeded with the config from the foreign wiki.

Is this missing "on a foreign wiki" in the problem description somewhere? There obviously is a way to do this for the wiki that you're currently on; it seems that the problems arise when you're looking at another wiki's DB from cross-wiki extensions like CentralAuth.

Yes. I updated the description. Basically anything that has access to the database, but not the config or extensions (or the API).

However, I don't think a database table is necessary as a solution to this problem. It may be a good idea for other reasons, but to address the particular problem of not knowing how to format titles on another wiki, we could add functionality to SiteConfiguration, WikiMap and/or NamespaceInfo that lets you grab the config from another wiki and resolve namespace names using that config. Maybe this could look like instantiating a separate NamespaceInfo instance and MediaWikiTitleCodec instance seeded with the config from the foreign wiki.

I was thinking that could technically work, and we have that with $wgConf, but that wont work for extensions because you can't run (or even inspect) the extension from a foreign wiki.

That's true, extension-registered namespaces wouldn't show up this way.

Previous tasks related to this subject:

Obviously we could get a long way into the weeds on this, but Roan is correct that it's not strictly necessary to do that to solve the problem. WikiMap can't query the configuration of uninstalled extensions, but Site could do it. Site represents the sites table, which is built by a maintenance script, apparently Wikibase/lib/maintenance/populateSitesTable.php, which is run in turn by WikimediaMaintenance/addWiki.php. It could run cross-wiki API queries or maintenance scripts to get data about installed extensions.

Site already has normalizePageName(), which has functionality very close to what David needs already. It works by doing a cross-wiki API query for every title to be normalized. Hopefully we could do better than that performance-wise. The point is that we already have this interface in Site which gets access to foreign namespace names, so that would be an elegant place to put it.

if you see a page in an extension's namespace in a foreign wiki's database, it's probably installed there

This is not remotely dependable for namespaces with constants between 100 and 200 on old wikis; this range historically saw very heavy use by both extensions and sites. In addition, some extensions allow sites to configure the constants used for their namespaces, so any given site using the extension might have the namespaces using basically any constant. (And this doesn't touch on cases like Wikia, where extensions have historically regularly been modified for various purposes, including to shift assigned namespace constants around.)