Page MenuHomePhabricator

Illogical XML in sitematrix API output
Closed, ResolvedPublic

Description

Currently, it gives numerous entries like

<language code="aa" name="Afar">
  <site>
    <site url="http://aa.wikipedia.org" code="wiki" />
    <site url="http://aa.wiktionary.org" code="wiktionary" />
    <site url="http://aa.wikibooks.org" code="wikibooks" />
  </site>
</language>

While it would be much more logical to rename the upper-level <site> element to <sites> to:

  1. Make it more consistent with other parts of API
  2. When parsing the XML using functions that search for elements by name, that would exclude from output the <site> element that does not contain any useful information.

Yeah, that would be a breaking change, so much discussion and caution is needed.


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/w/api.php?action=sitematrix

Details

Reference
bz14955

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:10 PM
bzimport added a project: SiteMatrix.
bzimport set Reference to bz14955.
bzimport added a subscriber: Unknown Object (MLST).

Bryan.TongMinh wrote:

You're right about this, but I don't think breaking compatibility for aesthetic is a good idea. Suggest WONTFIX.

(In reply to comment #1)

You're right about this, but I don't think breaking compatibility for aesthetic
is a good idea. Suggest WONTFIX.

First, see 2) in my rationale - I had some troubles with parsing with the current schema. Second, which tools actually use this action? For example, AWB does, but in the way that will not be affected by the proposed change.

Bryan.TongMinh wrote:

(In reply to comment #2)

(In reply to comment #1)

You're right about this, but I don't think breaking compatibility for aesthetic
is a good idea. Suggest WONTFIX.

First, see 2) in my rationale - I had some troubles with parsing with the
current schema.

If you are using xpath it would be something like site/site
If you are using the DOM it would be something like document.getElementsByTagName('site') and then loop over the results and check getAttribute('url')
If you are using SAX it is a little more complicated, but not undoable if you are using OOP.

Second, which tools actually use this action? For example, AWB
does, but in the way that will not be affected by the proposed change.

We don't know which tools will break and that is the whole problem.

Assigned to API lead developer.

(In reply to comment #4)

Assigned to API lead developer.

Unassigning from me. This is an extension, not core, so it's up to the extension authors to fix the issue.

  • Bug 16003 has been marked as a duplicate of this bug. ***

Marking as WONTFIX per comments #1 and #3.

Reopening. Many things have been changed and tools had to accommodate.

It's tool what has to accommodate to changes in software, not software should accommodate to unknown tools. In this manor mediawiki would never fix any issue.

Semantics of the <sites> tag is indisputable.

Benefits like easier traversing as well.

Besides the tag itself can be removed since it's just double-bracketing in fact.

<language code="aa" name="Afar">

<site url="http://aa.wikipedia.org" code="wiki" />
<site url="http://aa.wiktionary.org" code="wiktionary" />
<site url="http://aa.wikibooks.org" code="wikibooks" />

</language>

would make sense as well - what else can be in <language> than sites? <specials> also have only one-level nesting:

<specials>
...

<special url="http://beta.wikiversity.org" code="betawikiversity" />

...
</specials>

The point is that it's a cosmetic change, no matter how you put it. I've made a point of not making breaking changes for merely cosmetic reasons in the past, and I won't do it now either. If the author of SiteMatrix does want to, that's his call.

Than have the SiteMatrix author to decide if to close this bug or not. And it is not merely cosmetic, benefits are indisputable. Many have been said, I'll add another one again - smaller file size.

(In reply to comment #10)

Than have the SiteMatrix author to decide if to close this bug or not.

Fair enough. I stand by my WONTFIX recommendation, though. Who is the SiteMatrix author, anyway?

And it
is not merely cosmetic, benefits are indisputable. Many have been said, I'll
add another one again - smaller file size.

There are some benefits, yes, but:

  • it's not fixing anything that's really broken (the current output format isn't ideal, but it works)
  • it's not adding any features big enough to warrant a breaking change

These are basically the criteria I judge breaking changes by when considering them (plus, of course, the criterion that the change should be backwards compatible wherever possible).

Another, probably most important, issue:

Compare

http://www.mediawiki.org/w/api.php?action=sitematrix&format=xml
and
http://www.mediawiki.org/wiki/Special:SiteMatrix?action=raw

It should give the same output (except for <api> wrapper).

Bryan.TongMinh wrote:

Suggest WONTFIX. Breaking backwards compatibility is bad.

Reopening because of comment #12. This has been discussed on #wikimedia-tech in those days and there was a consensus that it should be repaired.

I've fixed the RAW output in r87200

The only difference now is the outer <api></api> and the fact that in the API specials come first, and languages second, in RAW this is the other way round. I don't see this as an issue, as using a proper XML parser, it should work either way