Page MenuHomePhabricator

Commons search auto-suggest for "files depicting..." should filter out articles
Closed, DeclinedPublic

Description

Since a large percentage of the items in Wikidata are now academic articles, many of the suggestions made by the search auto-suggest are also academic articles. This can be very confusing when the titles of academic articles sound like legitimate search terms, such as a person's name or the name of a specific place or thing:

Screen Shot 2020-12-16 at 4.13.33 PM.png (149×310 px, 26 KB)

In the screenshot above, all 4 search results are actually journal articles and none of them lead to actual search results:
Screen Shot 2020-12-16 at 4.18.07 PM.png (276×755 px, 43 KB)

This is a very frustrating and common experience.

The auto-suggest feature for "files depicting..." should automatically filter out anything that is an instance of scholarly article or biographical article.

Event Timeline

@matthiasmullie - Which code repo actually controls this feature? Is it a hook in WikibaseMediaInfo or something implemented directly in CirrusSearch or something else?

Change 650091 had a related patch set uploaded (by Matthias Mullie; owner: Matthias Mullie):
[mediawiki/extensions/WikibaseMediaInfo@master] Allow excluding depicts suggestions with specific statements

https://gerrit.wikimedia.org/r/650091

The code for this lives here: https://github.com/wikimedia/mediawiki-extensions-WikibaseMediaInfo/tree/master/resources/search

This is a bit tricky, though: there is a highly specific API endpoint for wikidata entity autocomplete suggestions (action=wbsearchentities) - it is the one currently used, but doesn't allow us to exclude certain results in a way that this ticket proposes. There is also no inexpensive way of fetching additional information that would allow filtering out unwanted results client-side.

Alternative is the full-featured search api, but ...

  • that searches all content (including any label/description/alias in any language that the user doesn't care about) - filtering those out would require over-fetching results and filtering them out clientside
  • it doesn't combine plain (exact keyword match) with prefix (part-of-word autocomplete match) in a way that wbsearchentities does; that'd require 2 separate calls to accomplish

So essentially, implementing this, would mean both a significant performance penalty (more API requests and larger response bodies) and poorer results, to the point where I doubt that it's still worth it.
(I have submitted a patch with a partial implementation, but will abandon - it was useful to investigate, but I believe the alternative results would be significantly worse than what we currently have - feel free to take a look at the patch if you want to tinker with it, there may be other avenues worth exploring that I haven't yet considered)

Given that we've been (and still are) working on a both an improved (image-specific) search understanding and algorithm & a more image-focused search UI for Commons (Special:MediaSearch), maybe we should start to think about dropping these "depicts" suggestions altogether and figure out how how we might blend this search box with the recent improvements.

Change 650091 abandoned by Matthias Mullie:
[mediawiki/extensions/WikibaseMediaInfo@master] Allow excluding depicts suggestions with specific statements

Reason:
results are poor, probably not worth it

https://gerrit.wikimedia.org/r/650091

@matthiasmullie - Thanks for the thorough investigation! Yeah, it sounds like we should revisit this once it is decided how MediaSearch will be utilized on Commons. Depending on how that goes, it may make sense to just drop the depicts auto-suggestions.

CBogen subscribed.

Closing because the "files depicting..." feature has been removed.