Page MenuHomePhabricator

add an index field to show original order of queried titles
Closed, ResolvedPublic

Description

When querying a number of titles in a singe request, especially when it is expected that some won't exist, it would be very useful to be able to access the results in the same order the titles were in.

Currently the "pages" in the results are sorted by id, with non existant ones having an invented negative id, thus the order of results is usually different to the order of requests.

It should be much less work to add an index field to api.php compared to iterating through the results with lots of string operations at the local side.

"index" could be a new field next to "pageid", "ns", "title" etc. It seems so low cost that it need not even be optional.


Version: unspecified
Severity: enhancement

Details

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:10 PM
bzimport set Reference to bz14859.

(In reply to comment #0)

When querying a number of titles in a singe request, especially when it is
expected that some won't exist, it would be very useful to be able to access
the results in the same order the titles were in.

I don't see how that's particularly useful, as our target audience are bots who just load all the result data in memory and iterate through it. If you really need titles in the requested order, you can reorder them yourself.

It should be much less work to add an index field to api.php compared to
iterating through the results with lots of string operations at the local side.

Actually, it's not. We feed the list of titles to the database, which spits out the data you need ordered by page ID. There's really no difference between reordering stuff to conform to the requested order on the server side and doing so on the client side. In this case, it's better to leave it at the client side, since it's not a very popular feature.

"index" could be a new field next to "pageid", "ns", "title" etc. It seems so
low cost that it need not even be optional.

Ordering stuff by ns/title on the database side is *far* from low-cost in most cases. This is because the database is pretty restrictive in the things it can order by efficiently. Ordering by page ID works for most queries, which is why we do it. As to ordering in the API itself (on the server side): that's every bit as slow/fast as ordering on the client side, so let's just leave that up to the client.

  • Bug 67131 has been marked as a duplicate of this bug. ***
  • Bug 68515 has been marked as a duplicate of this bug. ***
  • Bug 73323 has been marked as a duplicate of this bug. ***

I'm going to look at doing something along these lines, although it's going to be more general than just "stick an index field somewhere" and targeted at things like generator=search where the order actually matters.

Details at https://www.mediawiki.org/wiki/API/Architecture_work/Planning#Allow_generators_to_provide_data

Just preserving the order from generator would be really awesome. There is no reason why it's not being done other than "that's how link batch query sorts titles", however you don't really have to use the order from the query.

(In reply to Max Semenik from comment #6)

There is
no reason why it's not being done other than "that's how link batch query
sorts titles"

Not exactly. To "preserve" the order, we'd have to save the order somewhere and then reorder a number of different arrays after they were populated from various queries.

And then your JSON decoder might throw the ordering away anyway.

Anomie set Security to None.

Change 175759 had a related patch set uploaded (by Anomie):
API: Allow generaotrs to return data

https://gerrit.wikimedia.org/r/175759

Patch-For-Review

Change 175759 merged by jenkins-bot:
API: Allow generators to return data

https://gerrit.wikimedia.org/r/175759

It looks like this made it in time to hit WMF wikis with 1.25wmf10, see https://www.mediawiki.org/wiki/MediaWiki_1.25/Roadmap for the schedule.