Clustering for image searches
OpenPublic

Description

Author: rd232

Description:
Problems with image searching (particularly an issue for Commons, but elsewhere too):

  1. The current search reacts to every keyword and might give surprising results. For example: A search for "cucumber" delivers not only a cucumber, but also its use as a sex toy.
  2. When terms collide you won't find what you want to find. For example: If you search for "monarch" you will get hundreds of images of a butterfly, but very few results concerning monarchy.

The basic idea is to improve on this by clustering related search results. Roughly, this could work like this:

  1. The search works as usual and grabs all results by keyword.
  2. It looks at the categories of the results. If it finds multiple images from different parts of the category tree it will split the results into groups, labeling them after the lowest parent category. This means that it would form clusters using the categories to group the results.
  3. Instead of showing a list of images it would display these groups, which can be expanded.

Clustered search would not only be much more useful, but it would also solve (in relation to searching) the problem which the WMF's image filter is supposed to address - but without any need to specially classify or tag individual images.

There is a more detailed explanation (and an image mockup) at https://meta.wikimedia.org/wiki/Controversial_content/Brainstorming#Clustering_for_search_results_on_Commons

Bugzilla is not a good format to discuss this idea (it doesn't even have a Preview button!!), but we'd like to put it on developers' radar, and get some feedback if possible. Please feel free to leave comments on Meta in addition to Bugzilla.


Version: master
Severity: enhancement

bzimport added projects: CirrusSearch, Design.Via ConduitNov 22 2014, 12:10 AM
bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz35701.
bzimport created this task.Via LegacyApr 4 2012, 5:28 PM
bzimport added a comment.Via ConduitApr 29 2013, 3:10 PM

rd232 wrote:

Well it's been a year. Is there any sign of ... anything?

Aklapper added a comment.Via ConduitApr 30 2013, 10:49 AM

Rd232: No, as there has been no comment here, plus it's a low priority enhancement request (which makes it rather unlikely to get fixed if nobody contributes a patch), plus search has quite a few high priority issues that are more important, plus it's only two or three months now that there is somebody (Ram) who actively works on the Wikimedia Search code again.

Chad added a comment.Via ConduitFeb 13 2014, 11:43 PM

(In reply to Rd232 from comment #0)

  1. When terms collide you won't find what you want to find. For example: If you search for "monarch" you will get hundreds of images of a butterfly, but very few results concerning monarchy.

This is annoying.

The basic idea is to improve on this by clustering related search results.
Roughly, this could work like this:

  1. The search works as usual and grabs all results by keyword.
  2. It looks at the categories of the results. If it finds multiple images from different parts of the category tree it will split the results into groups, labeling them after the lowest parent category. This means that it would form clusters using the categories to group the results.
  3. Instead of showing a list of images it would display these groups, which can be expanded.

This could be a very cool idea, although implementation details sound hairy at the moment. Let's repurpose into a Cirrus bug though :)

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.