Page MenuHomePhabricator

Clustering (category facets) for image searches
Open, LowestPublicFeature

Description

Author: rd232

Description:
Problems with image searching (particularly an issue for Commons, but elsewhere too):

  1. The current search reacts to every keyword and might give surprising results. For example: A search for "cucumber" delivers not only a cucumber, but also its use as a sex toy.
  2. When terms collide you won't find what you want to find. For example: If you search for "monarch" you will get hundreds of images of a butterfly, but very few results concerning monarchy.

The basic idea is to improve on this by clustering related search results. Roughly, this could work like this:

  1. The search works as usual and grabs all results by keyword.
  2. It looks at the categories of the results. If it finds multiple images from different parts of the category tree it will split the results into groups, labeling them after the lowest parent category. This means that it would form clusters using the categories to group the results.
  3. Instead of showing a list of images it would display these groups, which can be expanded.

Clustered search would not only be much more useful, but it would also solve (in relation to searching) the problem which the WMF's image filter is supposed to address - but without any need to specially classify or tag individual images.

There is a more detailed explanation (and an image mockup) at https://meta.wikimedia.org/wiki/Controversial_content/Brainstorming#Clustering_for_search_results_on_Commons

Bugzilla is not a good format to discuss this idea (it doesn't even have a Preview button!!), but we'd like to put it on developers' radar, and get some feedback if possible. Please feel free to leave comments on Meta in addition to Bugzilla.


Version: master
Severity: enhancement

Details

Reference
bz35701

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:10 AM
bzimport added projects: CirrusSearch, Design.
bzimport set Reference to bz35701.
bzimport added a subscriber: Unknown Object (MLST).

rd232 wrote:

Well it's been a year. Is there any sign of ... anything?

Rd232: No, as there has been no comment here, plus it's a low priority enhancement request (which makes it rather unlikely to get fixed if nobody contributes a patch), plus search has quite a few high priority issues that are more important, plus it's only two or three months now that there is somebody (Ram) who actively works on the Wikimedia Search code again.

(In reply to Rd232 from comment #0)

  1. When terms collide you won't find what you want to find. For example: If

you search for "monarch" you will get hundreds of images of a butterfly, but
very few results concerning monarchy.

This is annoying.

The basic idea is to improve on this by clustering related search results.
Roughly, this could work like this:

  1. The search works as usual and grabs all results by keyword.
  2. It looks at the categories of the results. If it finds multiple images

from different parts of the category tree it will split the results into
groups, labeling them after the lowest parent category. This means that it
would form clusters using the categories to group the results.

  1. Instead of showing a list of images it would display these groups, which

can be expanded.

This could be a very cool idea, although implementation details sound hairy at the moment. Let's repurpose into a Cirrus bug though :)

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana lowered the priority of this task from Low to Lowest.Dec 29 2015, 11:58 PM
Deskana moved this task from Needs triage to Search on the Discovery-ARCHIVED board.
Nemo_bis renamed this task from Clustering for image searches to Clustering (category facets) for image searches.Dec 30 2015, 7:48 AM
Nemo_bis set Security to None.

Facets are a very common search feature, I'm unconvinced this is lowest priority in general.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:14 AM