Clustering (category facets) for image searches
Open, LowestPublic

Description

Author: rd232

Description:
Problems with image searching (particularly an issue for Commons, but elsewhere too):

  1. The current search reacts to every keyword and might give surprising results. For example: A search for "cucumber" delivers not only a cucumber, but also its use as a sex toy.
  2. When terms collide you won't find what you want to find. For example: If you search for "monarch" you will get hundreds of images of a butterfly, but very few results concerning monarchy.

The basic idea is to improve on this by clustering related search results. Roughly, this could work like this:

  1. The search works as usual and grabs all results by keyword.
  2. It looks at the categories of the results. If it finds multiple images from different parts of the category tree it will split the results into groups, labeling them after the lowest parent category. This means that it would form clusters using the categories to group the results.
  3. Instead of showing a list of images it would display these groups, which can be expanded.

Clustered search would not only be much more useful, but it would also solve (in relation to searching) the problem which the WMF's image filter is supposed to address - but without any need to specially classify or tag individual images.

There is a more detailed explanation (and an image mockup) at https://meta.wikimedia.org/wiki/Controversial_content/Brainstorming#Clustering_for_search_results_on_Commons

Bugzilla is not a good format to discuss this idea (it doesn't even have a Preview button!!), but we'd like to put it on developers' radar, and get some feedback if possible. Please feel free to leave comments on Meta in addition to Bugzilla.


Version: master
Severity: enhancement

Details

Security
None
Reference
bz35701
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz35701.
bzimport created this task.Apr 4 2012, 5:28 PM

rd232 wrote:

Well it's been a year. Is there any sign of ... anything?

Rd232: No, as there has been no comment here, plus it's a low priority enhancement request (which makes it rather unlikely to get fixed if nobody contributes a patch), plus search has quite a few high priority issues that are more important, plus it's only two or three months now that there is somebody (Ram) who actively works on the Wikimedia Search code again.

(In reply to Rd232 from comment #0)

  1. When terms collide you won't find what you want to find. For example: If you search for "monarch" you will get hundreds of images of a butterfly, but very few results concerning monarchy.

This is annoying.

The basic idea is to improve on this by clustering related search results.
Roughly, this could work like this:

  1. The search works as usual and grabs all results by keyword.
  2. It looks at the categories of the results. If it finds multiple images from different parts of the category tree it will split the results into groups, labeling them after the lowest parent category. This means that it would form clusters using the categories to group the results.
  3. Instead of showing a list of images it would display these groups, which can be expanded.

This could be a very cool idea, although implementation details sound hairy at the moment. Let's repurpose into a Cirrus bug though :)

demon removed a subscriber: demon.Aug 19 2015, 4:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 19 2015, 4:07 PM
Restricted Application added a project: Discovery. · View Herald Transcript
Ricordisamoa added a subscriber: Ricordisamoa.
Deskana lowered the priority of this task from "Low" to "Lowest".Dec 29 2015, 11:58 PM
Deskana moved this task from Needs triage to Search on the Discovery board.
Nemo_bis changed the title from "Clustering for image searches" to "Clustering (category facets) for image searches".Dec 30 2015, 7:48 AM
Nemo_bis set Security to None.

Facets are a very common search feature, I'm unconvinced this is lowest priority in general.

Add Comment