Page MenuHomePhabricator

Results order when using generator=search
Closed, ResolvedPublic

Description

Author: artur.bekasov

Description:
I am trying to use search as a generator, and get summary of every article found, effectively replicating functionality of the Special:Search page. That's what I am doing:

http://en.wikipedia.org/w/api.php?action=query&generator=search&gsrsearch=vector%20space&prop=extracts&exintro&exlimit=10&exsentences=1&explaintext

It works nicely, apart from one thing: it appears that the results are sorted by title. It's a shame, because Lucene does a decent job at ranking the results, as you can see if you just return a list of results:

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=vector%20space&srprop=

So the only way around that I've came up with is to do two requests: query with list=search to get search results, and then query with prop=extracts to get summaries of the titles found previously. It seems to work, but you probably understand that it's not a very reliable/efficient/beautiful solution.

In the comment for another bug (https://bugzilla.wikimedia.org/show_bug.cgi?id=14859#c1) it has been explained that respecting the original order of titles is not feasible/desirable. Generators are basically a different way of providing a list of title, so I can see why you might not be keen on implementing that. However, I am still keen on opening this ticket, for two reasons:

  1. It seems like a quite basic use case.
  2. The problem makes generator=search useless for most people.

I am using the latest API version as installed on Wikipedia wiki.


Version: unspecified
Severity: enhancement

Details

Reference
bz67131

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:25 AM
bzimport set Reference to bz67131.
bzimport added a subscriber: Unknown Object (MLST).

As you already noted, the reasons provided in bug 14859 apply here too. This is a duplicate of that bug.

(In reply to Artur Bekasov from comment #0)

  1. The problem makes generator=search useless for most people.

{{citation needed}} on "most". Especially with the improvements Cirrus is bringing, I see opportunity for plenty of use cases where people would want to use generator=search to find the unordered set of pages matching some query.

  • This bug has been marked as a duplicate of bug 14859 ***

wikibugs wrote:

*** Bug 68515 has been marked as a duplicate of this bug. ***

wikibugs wrote:

I actually disagree that this is simply a duplicate of bug 14859:
in bug 14859 the op already knows the order of titles he's querying for. Here search internally returns the order to the generator but that information is just destroyed and never given to the end user, causing us to need two queries instead of just one.

I don't see how not destroying that ordering information and returning it as an optional index list would have any significant performance impact.

If one is considered about re-ordering the up to 500 "pages" results one could even leave them in the order they are in atm [1], because with that optional index list from the generator input function one could actually resort the results on client side as mentioned in bug 14859. Additionally generator=search would not need 2 individual queries.

@Brad: Would be nice if you could reconsider your decision on this one and maybe reopen it.

[1]: (they are actually a result dict not list, so it's probably not wise to rely on their order anyhow)