Page MenuHomePhabricator

API: action=query with empty title set should still respond with "query" element
Open, LowestPublic

Description

Author: a.d.bergi

Description:
When I try to get info for a set of pages specified by a generator and that generator returns nothing, I'd expect to get an empty set. But you just get no set!
Example:
http://de.wikipedia.org/w/api.php?action=query&list=allpages&apprefix=DoesNotExist

<api>
  <query>
    <allpages />
  </query>
</api>

http://de.wikipedia.org/w/api.php?action=query&prop=revisions&generator=allpages&gapprefix=DoesNotExist
Actual result:

<api />

But Expected:

<api>
  <query>
    <pages />
  </query>
</api>

Or is it a feature? I think its hard to detect when parsing the result, because you usually step through these elements when trying to get your pageset. Then you never know whether its an error in your request, an server error (api error) or its just empty results.

Details

Reference
bz31901

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 11:51 PM
bzimport set Reference to bz31901.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

When I try to get info for a set of pages specified by a generator and that
generator returns nothing, I'd expect to get an empty set. But you just get no
set!

Actual result:
<api />
But Expected:
<api>

<query>
  <pages />
</query>

</api>

Looks like you just need to verify that the <query> element exists. <api/> isn't the same result you would get if there were an error, is it?

a.d.bergi wrote:

Yes, of course I can look up whether the <query> element exists. But why do I have to? From my point of view there is always a _query_ result, even if it just says "no pages available".
It is as like as retrieving a list (e.g. the allpages request from #0), I will get an empty element there instead of none.
My propose:

  1. When I have an "action=query" url, there should always be a <query> element.
  2. When I retrieve any property (prop=xyz), there should always be a <pages> element.

a.d.bergi wrote:

proposed patch

That was quite easy, I think. Or have I missed something? Should have only been that one if-condition.

Attached:

Roan, can you take a peek at this patch and confirm it would not cause compatibility problems?

a.d.bergi wrote:

(In reply to comment #5)

compatibility problems?

I think compatibility problems mostly are on client side. Implementation which distingish whether there are pages returned or not may get problems.
Until now the function to determine this looked up if there is the <query> element and if there is the <pages> element, else it returns false. But from now on this function will always return true, so it could happen that another function which expects a set of pages is called with a empty set of pages, instead of not to be called.

An example could be line 768 in /pywikipedia/trunk/pywikipedia/wikipedia.py:
if not 'pages' in data['query']:

raise RuntimeError("API query error, no pages found")

(but it didn't check for >'query' in data< before - anyway, I'm no pywikipedia expert)

(In reply to comment #5)

Roan, can you take a peek at this patch and confirm it would not cause
compatibility problems?

Compat problems are entirely on the client side, per comment #6, and it's far from crazy to suspect people wrote code that relies on this specific output format. For that reason, this would probably have to be announced as a breaking change, and I'm very wary of making breaking changes (or changes that could be breaking) for aesthetic reasons.

Bryan.TongMinh wrote:

I'd say no.

a.d.bergi wrote:

(In reply to comment #7)

I'm very wary of making breaking changes (or changes that could be
breaking) for aesthetic reasons.

I can understand that, but I think it's not only for aesthetic reasons. The convention to provide no <pages> element when there are none might be OK, but the missing <query> element only leads to workarounds - and so should be implemented cleaner.
When you query more than only properties (eg. prop and meta and list) there also always will be the <query> element, but maybe no <pages> element. So every implementation would have to check for !query || !pages, and I think it wouldn't cause any problems at least to provide the <query> element every time. What about that?

Looks like Roan and Bryan are saying WONTFIX even with a patch, right?
(I'll let them set it if they are.)

sumanah wrote:

(adding "patch" keyword for consistency)

sumanah wrote:

Bergi, if you believe this is still a problem, maybe you could use Developer access https://www.mediawiki.org/wiki/Developer_access to submit your patch directly into Git and Gerrit, so it'll get reviewed faster. Thanks.

I concur with Roan and Bryan, b/c sucks but breaking it sucks even more.

Part of APIv2. See
http://www.mediawiki.org/wiki/Requests_for_comment/API_Future#Cleanup_2 :

  • Move all items from 'query' to root in the result. The 'pages', 'normalized', 'continue', and all list/meta elements will be under the root element.

(In reply to Bergi from comment #4)

Created attachment 9336 [details]
proposed patch

Tried to put the patch attached here into Gerrit via http://tools.wmflabs.org/gerrit-patch-uploader/ but neither "git apply < patch" nor "patch -p0 < patch" nor "patch -p1 < patch" worked. Needs rework; setting "patch-reviewed" status.

Attached:

Change 122639 had a related patch set uploaded by Krinkle:
ApiQuery: Respond with "query/pages" even if there are 0 titles

https://gerrit.wikimedia.org/r/122639

(In reply to Roan Kattouw from comment #7)

(In reply to comment #5)

Roan, can you take a peek at this patch and confirm it would not cause
compatibility problems?

Compat problems are entirely on the client side, per comment #6, and it's
far from crazy to suspect people wrote code that relies on this specific
output format. For that reason, this would probably have to be announced as
a breaking change, and I'm very wary of making breaking changes (or changes
that could be breaking) for aesthetic reasons.

(In reply to Max Semenik from comment #13)

I concur with Roan and Bryan, b/c sucks but breaking it sucks even more.

How would ensuring there is always query/pages a breaking change?

I can imagine a client not explicitly checking the presence of every property (is an API after all, the point is to not have to double check every thing, the root structure should always be in place), in which case this patch fixes that.

But I can't imagine explicitly checking and relying on the the absence of the structure. Roan/Bryan/Max: Please provide a little more details on that use case, this doesn't sound realistic to me and not a breaking change.

(In reply to Krinkle from comment #18)

But I can't imagine explicitly checking and relying on the the absence of
the structure. Roan/Bryan/Max: Please provide a little more details on that
use case, this doesn't sound realistic to me and not a breaking change.

In some sort of PHP-ish pseudocode:

$result = $api->query( array( 'generator' => ... ) );
if ( !isset( $result['query']['pages'] ) ) {
    // Generator returned no pages, handle that
} else {
    // Do something with the pages
}

Yeah, that's a bit dumb. But people do dumb stuff sometimes, particularly when encouraged by the API representing booleans as present/absent rather than as booleans.

Change 122639 abandoned by Krinkle:
ApiQuery: Respond with "query/pages" even if there are 0 titles

https://gerrit.wikimedia.org/r/122639

Krinkle set Security to None.
Krinkle removed a subscriber: Unknown Object (MLST).