Page MenuHomePhabricator

OpenSearchXML extension resolves redirects in XML mode but not in JSON mode
Open, MediumPublicBUG REPORT

Description

Author: ben.rimmington

Description:
OVERVIEW DESCRIPTION:

The module that implements the OpenSearch protocol returns slightly different results in JSON and XML formats (for the same search string).

STEPS TO REPRODUCE:

  1. http://en.wikipedia.org/w/api.php?action=opensearch&search=Neptune&limit=50&format=jsonfm
  2. http://en.wikipedia.org/w/api.php?action=opensearch&search=Neptune&limit=50&format=xmlfm
  3. Compare the results.

ACTUAL RESULTS:

The JSON results are in strictly alphabetical order. The XML results are in roughly alphabetical order -- redirections have been resolved, and duplicates have been removed.

e.g. The fifth JSON result "Neptune (astrology)" corresponds to the fifth XML result "Planets in astrology".

e.g. The sixth JSON result "Neptune (astronomy)" has been removed from the XML results as a duplicate.

e.g. The 43rd JSON result "Neptune Emerald" (which redirects to the "Neptune_Emerald#My-Otome_Zwei" section of the "Nina Wáng" page) has a corresponding XML result (but without the section anchor).

The JSON results contain 50 items (as requested). The XML results contain 35 items -- after redirections, 15 duplicates were removed.

EXPECTED RESULTS:

The JSON and XML results should be consistent with each other.

The action=query and action=parse modules support a "redirects" parameter -- perhaps action=opensearch should do the same?


Version: unspecified
Severity: trivial

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:26 PM
bzimport set Reference to bz17142.

Since action=opensearch is a prefix search, it shouldn't resolve redirects; tweaking summary accordingly.

Failing to resolve redirects leads to really crappy, ugly results -- much uglier with the text extract & thumbnail image than when doing a plain text results, where we never bothered to worry about it -- and it was very much done on purpose.

Reverted in r46295.

matthew.britton wrote:

(In reply to comment #3)

Failing to resolve redirects leads to really crappy, ugly results -- much
uglier with the text extract & thumbnail image than when doing a plain text
results, where we never bothered to worry about it -- and it was very much done
on purpose.

...so in JSON mode it *should* resolve redirects? I think this bug was as much about the inconsistency between the two as anything else.

(In reply to comment #3)

Failing to resolve redirects leads to really crappy, ugly results -- much
uglier with the text extract & thumbnail image than when doing a plain text
results, where we never bothered to worry about it -- and it was very much done
on purpose.

I disagree. The OpenSearch module is supposed to be a prefix search. When typing in "Nept", a list in which everything starts with "Nept" and is sorted alphabetically is what you'd expect, not a list that has "Planets in astrology" in the middle of it (and hence isn't alphabetically sorted, because redirects are replaced in their slot).

(In reply to comment #4)

...so in JSON mode it *should* resolve redirects? I think this bug was as much
about the inconsistency between the two as anything else.

Also true. One of the two behaviors needs to be chosen as the 'right' one, and should be used consistently. REOPENing on this basis and changing summary accordingly.

Committed a proposed fix in r46341 (core) and r46342 (extension) which adds a &redirects parameter to action=opensearch, and only resolves redirects if that parameter is added. Resolving as FIXED in the hope that this satisfies everyone.

It's supposed to be a useful *suggestions* search, and in fact has no requirement to be a strict *prefix* search. Destroying the functionality by default seems like a pretty crappy idea. Reopening.

Reverted changes for now in r46379. For some reason the API doesn't allow a boolean parameter to default to true; perhaps it's only checking for the presence of the parameter (as with HTML check boxes) rather than paying attention to the value?

As a result, the more detailed XML search output ends up being hideously filled with "#REDIRECT Bla bla" in the results, which is strongly counter to sensible usability.

(If you're making any further changes to this, please make sure you're actually *testing* the results; check the behavior in IE 8 beta for instance.)

ben.rimmington wrote:

Now that I've seen how action=opensearch is implemented, I think this bug report could be closed as INVALID. Maybe the differences should just be documented instead.

public function getAllowedParams() {

return array (
    // ... //
    'format' => array (
        ApiBase :: PARAM_DFLT => 'json',
        ApiBase :: PARAM_TYPE => array (
            'json',
            'jsonfm',
            'xml',
            'xmlfm',
        ),
    ),
);

}

public function getParamDescription() {

return array (
    // ... //
    'format' => 'The format of the output',
);

}

public function getDescription() {

return array (
    'This module/extension implements the OpenSearch protocol.',
    'NOTE: Redirects are only resolved for the XML and XMLFM output formats.',
);

}

OFF TOPIC: Only 1 of 35 results for http://en.wikipedia.org/w/api.php?action=opensearch&search=Neptune&limit=50&format=xmlfm has a "badge" image. Shall I report this as a bug?

matthew.britton wrote:

(In reply to comment #9)

Now that I've seen how action=opensearch is implemented, I think this bug
report could be closed as INVALID. Maybe the differences should just be
documented instead.

Could do, does seem a bit odd though... require clients to use either an XML parser or a JSON parser depending on what they want returned? :/

ben.rimmington wrote:

(In reply to comment #10)

(In reply to comment #9)

Now that I've seen how action=opensearch is implemented, I think this bug
report could be closed as INVALID. Maybe the differences should just be
documented instead.

Could do, does seem a bit odd though... require clients to use either an XML
parser or a JSON parser depending on what they want returned? :/

You're probably right -- I only suggested it because I regret causing this disagreement over a minor bug.

I've just tried using a list=search query, but it returned an "srsearch-title-disabled" error. Otherwise, this might be an alternative to action=opensearch, and it also has an "srredirects" parameter.

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Neptune&srwhat=title

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Neptune&srwhat=text

(In reply to comment #8)

Reverted changes for now in r46379. For some reason the API doesn't allow a
boolean parameter to default to true; perhaps it's only checking for the
presence of the parameter (as with HTML check boxes) rather than paying
attention to the value?

Yeah, that's what it does. In example requests, boolean parameters aren't even set to a value (some clients set them to 1 or 'yes' or something like that, but that's their choice). Of course we could add a &noredirect parameter instead, or just keep the &redirects parameter and tell clients/plugins/whatever that adding an extra parameter to their request isn't the end of the world.

(In reply to comment #10)

(In reply to comment #9)

Now that I've seen how action=opensearch is implemented, I think this bug
report could be closed as INVALID. Maybe the differences should just be
documented instead.

Could do, does seem a bit odd though... require clients to use either an XML
parser or a JSON parser depending on what they want returned? :/

That's exactly the reason why I don't like the current behavior. Choosing a different format shouldn't have the side effect of toggling redirect resolution, especially if it's not documented.

(In reply to comment #11)

I've just tried using a list=search query, but it returned an
"srsearch-title-disabled" error. Otherwise, this might be an alternative to
action=opensearch, and it also has an "srredirects" parameter.

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Neptune&srwhat=title

http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=Neptune&srwhat=text

If you're looking for a prefix search, use http://en.wikipedia.org/w/api.php?action=query&list=allpages&apprefix=Neptune (which incidentally, also has redirect resolution available). Either way, these requests won't return the right format for suggestion search plugins.

ben.rimmington wrote:

(In reply to comment #12)

If you're looking for a prefix search, use
http://en.wikipedia.org/w/api.php?action=query&list=allpages&apprefix=Neptune
(which incidentally, also has redirect resolution available). Either way, these
requests won't return the right format for suggestion search plugins.

Thanks for the suggestion, but I've realized that list=allpages or list=search might be too inefficient (for my application). I want to display an image and description with each search result, so the OpenSearchXml extension is the best option.

gerritbot subscribed.

Change 190290 had a related patch set uploaded (by Anomie):
Enable redirect resolution by default in JSON OpenSearch results

https://gerrit.wikimedia.org/r/190290

Patch-For-Review

Redirect resolution is nice, but the question is whether the user understands what it's going on. Redirect targets can have zero words in common with the redirect title, which will confuse the user of search suggestions, unless the user knows beforehand the equivalence/relation between the two terms.

Perhaps the "true" solution is to show both the title and an annotation/snippet? The annotation could be filled the redirect target title in this case, or by the namespace in MixedNamespaceSearchSuggestions, etc.

Reedy set Security to None.

Noting OpenSearchXML is part of core as of 1.25

Removing assignment from some tasks I'm not actively working on. Volunteers welcome, I'm happy to help if pinged!

Aklapper changed the subtype of this task from "Task" to "Bug Report".Feb 5 2022, 2:33 PM