API query siteinfo namespaces collection should be an array in JSON v2
Closed, DeclinedPublic

Description

In siteinfo.namespaces, the result is a JSON object with redundant keys. While there are backwards-compatibility issues to consider, I would suggest that this should be a simple array, instead.

RobinHood70 updated the task description. (Show Details)
RobinHood70 raised the priority of this task from to Lowest.
RobinHood70 added a project: MediaWiki-API.
RobinHood70 added a subscriber: RobinHood70.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 16 2015, 4:32 PM

Change 218658 had a related patch set uploaded (by Anomie):
API: Add some BCarray into ApiQuerySiteinfo

https://gerrit.wikimedia.org/r/218658

Anomie moved this task from Unsorted to Needs Review on the MediaWiki-API board.Jun 16 2015, 5:02 PM

Change 218658 merged by jenkins-bot:
API: Add some BCarray into ApiQuerySiteinfo

https://gerrit.wikimedia.org/r/218658

Anomie added a subscriber: Anomie.Jun 16 2015, 8:25 PM

Should be deployed to WMF wikis with 1.26wmf11, see https://www.mediawiki.org/wiki/MediaWiki_1.26/Roadmap for the schedule.

Anomie closed this task as Resolved.Jun 16 2015, 8:26 PM
Anomie claimed this task.

Hm.. I'm not sure I understand how this is an improvement?

Before:

"namespaces": {
    "-1": {
        "id": -1,
        "case": "first-letter",
        "name": "Special",
        ..
    },
    "6": {
        "id": 6,
        "case": "first-letter",
        "name": "File",
        ..
    },

After:

"namespaces": [
    {
        "id": -1,
        "case": "first-letter",
        "name": "Special",
        ..
    },
    },
        "id": 6,
        "case": "first-letter",
        "name": "File",
        ..
    },

This was one of the few areas where the API used to output a sensible structure that costumers can access and work with as-is.

With it being a numerical array instead of an object in JSON, it makes the data inaccessible to customers. I imagine every single customer of this API will now either iterate over each value every time, or (more likely) they will have to map this straight back to an object or hash table to access values by namespace id.

Also, due to Array being a subclass of Object in JavaScript - and array keys being casted to strings, this will break in very subtle ways. Key lookups (except for negative ones) will appear to succeed but return the wrong value. E.g. namespaces[6] will return "User talk" instead of "File".

TTO added a subscriber: TTO.Jun 17 2015, 2:02 AM

Per Krinkle, I think the old style was much more sane. The ID should have been removed from the object itself (i.e. "id": 6, should have been removed). Perhaps the reporter thought the keys were simply indices of a simple zero-indexed array (which they are not, of course - they're namespace numbers).

@TTO: Not at all. I was assuming that people would map the JSON to a more useful collection. It never occurred to me that would actually work with the JSON directly without parsing it. In the context of parsing it, in most languages, parsing a key-value pair is more work (albeit only slightly) than having all the data in the same place (i.e., one element of the array), so it seemed to me that the logical way to go was to convert it to an array.

TTO added a comment.Jun 17 2015, 9:51 AM

The counter-argument to that is that many people access JSON from browser JavaScript, in which case the current (new) setup is just plain annoying...

"More sane" depends on the usage. If you're always using the namespace number at the key, the old object layout works well. If you're wanting to use the namespace name as the key, you're going to need to be iterating anyway to build your own hashes and array iteration is usually more convenient than object iteration.

So the question is whether the first case outweighs the second.

@TTO: Yes, I can see where you're coming from. It's one of the pitfalls of working in a different language is that I see the JSON as data to be parsed, not a fully realized data object, as it obviously would be when working in JavaScript. As Anomie says, though, even in JS, it becomes usage-dependent as to which way is easiest to deal with. In the end, at least for me, the difference is minor, so if the decision is to go back to an object, and possibly remove the id field as redundant, I can deal with that.

"More sane" depends on the usage. If you're always using the namespace number at the key, the old object layout works well. If you're wanting to use the namespace name as the key, you're going to need to be iterating anyway to build your own hashes and array iteration is usually more convenient than object iteration.

I don't buy it that array iteration is easier than hash iteration. That may be marginally so, but is that really an argument to justify changing an API? How many other objects do we plan on changing? How does this compare to page queries for example? Those and many others have indexed IDs there as well. We should be consistent. Perhaps rip them all out at once. Or revert this change.

I don't think we should support multiple relationships in the output. Remapping is easy. That's a normal part of consuming any API. But at least there is some structure to begin with. (Which also serves as convenient guarantee that IDs are unique.)

Namespace IDs are common and there's a fixed set of them configured. Treating these like indexless data rows (which could potentially be batched/continued) instead of a finite key/value pair seems odd. For comparison, the plethora of namespace related methods in MediaWiki PHP all return associative arrays. They always have.

(Also beware that when keying by namespace name, unless exclusively working canonical values from other API responses, one must include localised names, canonical names, and namespacealiases in those keys.)

so if the decision is to go back to an object, and possibly remove the id field as redundant, I can deal with that.

I don't have any strong opinion on object vs array in formatversion=2 here, but the id field will stay either way.

How does this compare to page queries for example? Those [...] have indexed IDs there as well.

Not in formatversion=2 they don't, that was changed in rMWbeab6b009ef4: Change API result data structure to be cleaner in new formats.

I don't have any strong opinion on object vs array in formatversion=2 here, but the id field will stay either way.

Yeah, I thought it might. Thanks for confirming.

There's no harm in having keys IMHO.
They can be used to get a namespace's name quickly, whereas proper API clients will iterate over the whole set anyway.

Ricordisamoa removed a project: Patch-For-Review.
Ricordisamoa set Security to None.
Ricordisamoa removed a subscriber: gerritbot.

Change 219514 had a related patch set uploaded (by TTO):
Restore namespace-number keys in API prop=siteinfo&siprop=namespaces

https://gerrit.wikimedia.org/r/219514

Change 219514 merged by jenkins-bot:
Restore namespace-number keys in APIQuerySiteinfo siprop=namespaces

https://gerrit.wikimedia.org/r/219514