Page MenuHomePhabricator

Action API siteinfo should return canonical name for namespace 0
Closed, ResolvedPublic

Description

Request Status: New Request
Request Type: break-fix
Related OKRs:

Request Title: Action API siteinfo should return canonical name for namespace 0

  • Request Description: Wikimedia Enterprise uses this Action API call to populate our namespace.name field as part of our APIs. This endpoint doesn't return a name for namespace 0 ("Main") across language projects. It appears to work as expected for other namespaces.
  • Indicate Priority Level: Medium
  • Main Requestors: @RBrounley_WMF
  • Ideal Delivery Date: As soon as possible
  • Stakeholders: @HShaikh

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYeshttps://phabricator.wikimedia.org/T304123
Product One PagerYes<add link here>
Product Requirements Document (PRD)Yes<add link here>
Product RoadmapNo<add link here>
Product Planning/Business CaseNo<add link here>
Product BriefNo<add link here>
Other LinksNo<add links here>

Event Timeline

DAbad added a project: API Platform.
DAbad moved this task from Backlog to Investigate on the Foundational Technology Requests board.

May 17, 2021 Update

  • Reviewing as part of upcoming sprint

We will take a look during this sprint.

tl;dr: I think that's a feature, not a bug.

Disclaimer: I'm going to confirm what I typed below. It is possible I'm misunderstanding something.

Full explanation:

Per the docs at Manual:Namespace:

Pages exist within a namespace, and this can be distinguished using the namespace prefix of a page, which forms part of the title of a page, separated with a colon

and

The "main namespace" does not have a prefix.

So for example, this url works:

https://en.wikipedia.org/wiki/Earth

But this one does not:

https://en.wikipedia.org/wiki/Main:Earth

Perhaps for this reason, it appears to be intentional in the code that namespace 0 uses an empty string as its canonical name:
https://gerrit.wikimedia.org/g/mediawiki/core/+/b085f24cc7257817f0b828add3f9d0c0878eb289/includes/title/NamespaceInfo.php#384

Other relevant changes:
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/534477
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/450433

The getCanonicalNamespaces() function was introduced to file WMNamespace.php in commit b7824e03cf35591d2c934bd9c07a20baa8d2f5a8:

Trying to clean up the mess with $wgCanonicalNamespaceNames and $wgExtraNamespaces. Niklas Laxström 8/20/10, 5:25 AM

(The function was later moved to NamespaceInfo.php as part of a larger code cleanup effort.)

I couldn't find that change in gerrit (maybe it predates gerrit?) but the initial body of the function looked like this, and already includes an explicit empty string for NS_MAIN:

public static function getCanonicalNamespaces() {

		static $namespaces = null;
		if ( $namespaces === null ) {
			global $wgExtraNamespaces, $wgCanonicalNamespaceNames;
			if ( is_array( $wgExtraNamespaces ) ) {
				$namespaces = $wgCanonicalNamespaceNames + $wgExtraNamespaces;
			}
			$namespaces[NS_MAIN] = '';
			var_dump( $namespaces );
		}
		return $namespaces;

}

FWIW, Niklas very quickly submitted a followup change that removed the var_dump. Wish I could say I'd never made that mistake, but ... well ... :)

Regarding the specific requested change to this endpoint, I do not (yet) know what the implications would be of changing that (at any of the several points in the code where it could be changed). However, I am concerned that it could break things. For example, if something were using those values as part of url generation and changing the endpoint caused it to start prepending "Main:" to regular pages, those urls would not work.

I'll ask around to make sure I'm properly understanding all this.

I've confirmed that my comment above is accurate.

With that said, is there something else we could to to help? What larger problem are you trying to solve? Or, stated differently, what issue is the lack of a namespace name causing?

I think there are two things to here:

  • The canonical name of the main namespace is "" (the empty string). This is not a bug.
  • The canonical name of the main namespace is missing from the API response. This is at least an inconsistency. It should probably be changed.

Excerpt from the API response:

"0": {
     "id": 0,
     "case": "first-letter",
     "content": "",
     "*": ""
 },
 "1": {
     "id": 1,
     "case": "first-letter",
     "subpages": "",
     "canonical": "Talk",
     "*": "Talk"
 },

I see no good reason for the "canonical" field to be missing from the entry for the main namespace ("0"). NamespaceInfo::getCanonicalName( NS_MAIN ) returns an empty string, not null. NamespaceInfo::getCanonicalNamespaces() returns an entry for NS_MAIN as well.

The canonical field is explicitly excluded if it is falsey.

That was added a really long time ago (Dec. 16, 2008) in commit 5bf752ee6af92b316b08431bcb633e7a63316723. It seems to me like there is a very good chance that this is an unintentional behavior that either didn't matter than, or developed over time (I was not bold enough to check out a copy of MW from '08 and confirm its behavior).

In the current code, getCanonicalName() is guaranteed to return either the name (in this case, an empty string) or the boolean value false. So we could change the endpoint to include the empty string by changing this:

`

			if ( $canonical ) {
				$data[$ns]['canonical'] = strtr( $canonical, '_', ' ' );
			}

`
to this:

`

			if ( $canonical !== false ) {
				$data[$ns]['canonical'] = strtr( $canonical, '_', ' ' );
			}

`

The relevant part of the API response then becomes:

`

"0": {
    "id": 0,
    "case": "first-letter",
    "canonical": "",
    "content": "",
    "*": ""
},

`
This seems like a reasonably safe change, that is unlikely break anything.

@RBrounley_WMF , would this change meet your needs?

Hi - yes thanks for the clarification. On our side, we'll map canonical "" -> "Main". Do you know if it is called something other than "Main" across language projects (beyond the language translation) - is that queryable / accessible somewhere in documentation?

Do you know if it is called something other than "Main" across language projects (beyond the language translation) - is that queryable / accessible somewhere in documentation?

Hmmm, I'm not sure I understand what you need. Do you mean from a code/internal perspective, or from a human presentation perspective?

From a human perspective, at least on English Wikipedia it is called "(Article)" on Special:Search (expand "Search In" then click "Add namespaces..."). As for it being accessible somewhere in documentation, there's a table here, if that helps.

Maybe @daniel will have more insight, but from a code/technical perspective, I don't think it's possible for it to be called something else across projects, because I don't think the code really even calls it "Main" in the first place. Most places in the code use the symbol NS_MAIN, which is defined as the integer 0. In config, it is simply referred to by the value 0 and doesn't even have a name.

The closest thing I could find to it being called something in the code is in en.json, which includes this line:

"blanknamespace": "(Main)",

Does any of that help? I feel like I'm answering the wrong question, so feel free to clarify/restate and I'll try again.

Yes sorry @BPirkle - I wrote this ticket much more prescriptively of the solution than I should have. The bug report on our side can be found here on the Wikimedia Enterprise phab board. We are hardcoding "Article" since its returned empty and at the time confused us when in fact it is "(Main)", oversight on my part.

We're looking for the blend of code/internal (for consistency to the systems) and human perspective (for usability for less mediawiki familiar users wanting to understand what they will receive from namespace 0). I think if we returned "" on our side for namespace 0 it would incur some confusion, although likely minor and something we could definitely clarify in our documentation. But looking at en.json, and other language configs, that might be the best solution I can see from here. Maybe we'll just port these configs as context on our side for questions like this...really good resource. I'll run it through the team + I appreciate you finding/linking that.

Also @daniel - thanks for the context and help as well.

I think the conceptual confusion here is between namespace prefixes and namespace "names". They are nearly always the same - except for the main namespace, where the prefix is the empty string, but we still need a name for use in the UI, which tends to be "(Main)".

I suppose what you are looking for is a way to say, in an API response, "this page comes from namespace X". In our API, we tend to expose numeric namespace IDs, but that's not great either. So what should X be for the main namespace? Not sure. Does it need to be machiene readable or human readable? Should it be localized or always the same for all languages?

DAbad changed the task status from Open to In Progress.Jun 8 2022, 3:11 PM
DAbad moved this task from Investigate to Sign-off on the Foundational Technology Requests board.
DAbad moved this task from Sign-off to Done on the Foundational Technology Requests board.

Signed-Off by WME during June 8, 2022 tech Steering committee. Closing