Page MenuHomePhabricator

commons "ideal" search
Closed, DuplicatePublic

Description

In an ideal world when a user searches for gold on commonswiki they would find items that have Q897, Q208045, etc. Today we require some UI element to first search wikidata and have the user choose the gold they are referring to.

Event Timeline

To make something like this happen there are two high level approaches i can think of:

  • query expansion by searching wikidata first and filtering commons search
  • Injecting wikidata info into commonswiki, or injecting commonswiki info into wikidata.

Query expansion could work for some cases, but for many popular words this can't work. Wikidata prefix search for gold returns at least 9000 results. In general I don't think this will be a scalable solution. That brings us to either injecting wikidata labels into commonswiki pages, or injecting commonswiki pages into the related wikidata elements. I don't think we can copy wikidata into commonswiki. Keeping the references up to date would mean that when someone edits the Cat item on wikidata we may have to update a million or more cat images on commonswiki.

The only viable solution then is somehow injecting commonswiki references into the wikidata index, perhaps similar to how other wikis inject local_sites_with_dupes into commonswiki. I'm not entirely sure what the query for this would have to look like, but I don't think any of the other options are even viable.

Without this feature, "depicts" and other structured data becomes useless for typical users trying to find images with the builtin search using obvious search terms. I was adding structured data to some images and then wanted to test the search feature in other languages to see how it worked. It didn't work as expected, which led me here.

For example this image includes structured data of "rabbit" and "cooking":
https://commons.wikimedia.org/wiki/File:Houghton_AC7_R3246_772f_-_Frugal_Housewife,_hare.jpg

However a search for the words rabbit and cooking does not result in this image:
https://commons.wikimedia.org/w/index.php?sort=relevance&search=rabbit+cooking&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns6=1&ns12=1&ns14=1&ns100=1&ns106=1

A search for haswbstatement:P180=Q9394 haswbstatement:P180=Q38695 does provide this image:
https://commons.wikimedia.org/w/index.php?sort=last_edit_desc&search=haswbstatement%3AP180%3DQ9394+haswbstatement%3AP180%3DQ38695&title=Special:Search&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D&ns0=1&ns6=1&ns12=1&ns14=1&ns100=1&ns106=1

That's a big loss for most users and makes the entry of structured data, for me at least, seem less useful.

I was playing around with the search some more and I think this is actually a lot closer to solved than it appears. In the main search box at the top of the page, when typing in a search term you can select a "depicts" item which then becomes a haswbstatement:P180 item for the search. The problem is there's no easy was to add a second haswbstatement:P180 term. If the advanced (AKA search results) page had the same box or function, maybe somewhere in the "Search in page text" section, so someone could just add a 2nd, 3rd, etc term, that would be pretty accessible.

Could we come at it from the other direction? Use the wikidata drop-down result to generate a text search in addition to the haswbstatement results. Set up search so that all words in the search are selected from the drop-down, but then use wikidata's "label" "description" and/or "Also known as" to generate additional results based on those words for a "regular" text search.

Hello @Nathank2 . We're actually working on a new search system now, and it should address a number of points you bring up here. It's still in the early prototyping phases, but we hope to have some news and beta versions available in the coming weeks.

@Ramsey-WMF that's great! My programming skills are minimal, but I'd be happy to help in other ways.