Page MenuHomePhabricator

A Commons search user should be able to search for only captions
Closed, ResolvedPublic

Description

This page should report the caption but does not: https://commons.wikimedia.org/wiki/File:Mural_La_Fuente.jpg?action=cirrusdump

Snippet of contextual conversation:

hmm, i'm not seeing the structured data elements show up in the generated documents like they do on wikidata. We might need to configure some things
5:11 PM i'm remember some thing cormac did to copy the fields over on build ...
5:11 PM <abittaker> Amanda Bittaker i feel like it would be useful to say, "search for captions" or "search for these words in captions"
5:11 PM to users, not just me in this moment
5:12 PM <ebernhardson> guy sure, and i think that data is supposed to be indexed but it's not at the moment. will have to look into why

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
EBjune triaged this task as Medium priority.Jan 17 2019, 6:12 PM
EBjune moved this task from needs triage to Up Next on the Discovery-Search board.

Perhaps i was simply blind ... but the link (that I provided) does include the captions in two places:

  • labels.{lang}, for caption specific search
  • duplicated into opening_text which gives it a reasonable weight in the full text search

The problem is going to be querying it. Currently on wikibase we can query the entity labels or we can perform a normal query, but we can't integrate the two result sets into a single list. This is gated on the namespace being queried, which works for wikidata.org but will not work on commons as the data is stored in a slot of the NS_FILE page, rather than an entity namespace.

A couple options:

  • When the file namespace is being searched we could run a secondary search for labels and put it in the sidebar, the same as the sister-project results are shown on many wikipedias. All the infrastructure exists for this, but someone has to figure out how to glue it together properly.
  • We could add a full text keyword for the opening_text field on commons, which will effectively be caption search. This won't have per-language search analysis like the labels handling, but is very easy to do.

@Smalyshev @dcausse you might weigh in as well

Currently on wikibase we can query the entity labels or we can perform a normal query, but we can't integrate the two result sets into a single list.

Yes, this is essentially what T194968: Enable search in all wikidata namespaces combined and T190454: Display entity & article namespace completion search together are about. Right now we have two searches - article and Wikibase, which require different queries. In fact, more than two, since search in different Wikibase types requires different queries (T204813: Allow looking for items and lexemes namespaces together by default). There is more discussion inside these tasks (and I'd prefer to keep technical discussion there and not split it into here) but it seems to be the same issue mostly. We could either solve one of those and use it's results, or make a specifically tailored solution for Commons along the lines Erik outlined above.

Captions data is injected into opening_text, and so it's searched with a normal search AFAICS

https://commons.wikimedia.org/w/index.php?search=bratach+Loch+Garmain&title=Special:Search&go=Go

Erm ... I don't actually see what the issue is here. @Abit ?

@Cparle The original request was to limit search to pages that contain captions, or basically to limit the search to captions. A keyword against opening_text could do this probably by instantiating the TextFieldFilterFeature

Ah ok, I didn't get the "limit" bit

EBernhardson renamed this task from A Commons search user should be able to search for captions to A Commons search user should be able to search for only captions.Jan 18 2019, 4:55 PM