Page MenuHomePhabricator

SiteMatrix API has two fields called "code", which is confusing
Open, Needs TriagePublic

Description

Here is a heavily abridged version of the English Wikipedia's SiteMatrix API result, for reference:

{
    "sitematrix": {
        "0": {
            "code": "de",
            "name": "Deutsch",
            "site": [
                {
                    "url": "http://de.wikipedia.org",
                    "dbname": "dewiki",
                    "code": "wiki",
                    "sitename": "Wikipedia"
                }
            ],
            "localname": "German"
        } 
}

The key "code" appears twice, in two different contexts here.
One is referring to the ISO 639 code of the project's language (de, for German), and the other is referring to the internal database name suffix for the project (wiki).
This is causing a bit of confusion (e.g. on translatewiki.net, where they are trying to decipher the help message)

I'm unsure what our policy is for changing the API output, but it'd certainly be nice to clarify this by renaming the former lang or the latter project_code.

To recreate run a sitematrix query in an API sandbox.

Event Timeline

MC8 raised the priority of this task from to Needs Triage.
MC8 updated the task description. (Show Details)
MC8 subscribed.

I'm unsure what our policy is for changing the API output

Decide whether it's worth the breaking change, and then coordinate with someone like @Anomie or @Reedy to send out an email to mediawiki-api-announce about the change. They'll also be able to check how often this API is being used and who the major users are.

The "ISO 639 code of the project's language" isn't really that, although most of them are. Internally in MediaWiki it's called "language".

The "internal database name suffix" is more or less accurate. Internally in MediaWiki it's generally called "site".

"special" wikis still have "site" and "language" parts despite the "language" often not resembling a language at all. For these, the "code" field is the "language" part if the "site" is "wiki" or the concatenation of "language" and "site" otherwise.

If we're doing breaking changes here I'd personally like to see it turned into a query meta module and clean up the output format a bit too. For example, all numeric keys and then "specials" is crappy, put the language array under "languages" instead or make the whole thing an object keyed by language (and make the 'site' array an object keyed by "site" then, too). And just return the "language" and "site" in the 'specials' objects instead of an inconsistent "code", unless that code has some actual meaning. And if we could somehow get the actual ISO 639 code to put in there, all the better.

Why not increment the API and have this change separate? You could make the first "code", "lang" and the second "code", "site".