Page MenuHomePhabricator

Make keyword to match Wikibase statement data contained in the search index
Closed, ResolvedPublic

Description

As a a first stab at making images findable via 'depicts' statements, I propose to add a feature to CirrusSearch called haswbstatement. It'll work similar to other CirrusSearch features e.g. intitle, incategory

So say, for example, the user wants to search for images of bicycles. Assume the property id for depicts is P999 and the item id for bicycle is Q888 then the user will be able to find images of bicycles using the query haswbstatement:P999=Q888

Related Objects

StatusAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenNone
ResolvedAbit
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
ResolvedRamsey-WMF
OpenNone
OpenCparle
Opencscott
InvalidNone
ResolvedSmalyshev
ResolvedCparle

Event Timeline

Cparle triaged this task as Normal priority.Apr 17 2018, 10:13 AM
Cparle created this task.

This should be pretty straightforward to do.
Nitpick: we might consider naming the keyword differently so that it aligns with what we already have (insomething or hassomething). Why not something like haswbstatement?

dcausse rescinded a token.

Nitpick: we might consider naming the keyword differently so that it aligns with what we already have (insomething or hassomething). Why not something like haswbstatement?

Cool, haswbstatement it is

Ramsey-WMF moved this task from Untriaged to Triaged on the Multimedia board.Apr 17 2018, 5:38 PM
Ramsey-WMF added a subscriber: Ramsey-WMF.

This seems to be mostly duplicate of T163642, only applied to File and Wikibase instance for Commons (MediaInfo?).

One question to resolve: P999=Q888 assumes the part after = is an entity ID. However, in T163642 and T99899 it could be a string. We will need syntax for both, and generic name like haswbstatement does not convey which one it uses. We can do either:

  1. Have separate keywords for string match and item match (and maybe also other matches in the future, e.g. numbers)
  2. Have one keyword but find some syntax to say when it's a string and when it is an item ID
  3. Assume anything that looks like item ID is an item ID (not recommended)

Statements are stored in the index as strings, just in the format X=Y right? Don't understand why we'd need different keywords for matching P999=Q888 and P777=some_string

There will be more and more 3rd party Wikibase installations. Some of them will be linked to in statements as external identifiers. They will also use Q-IDs. So you can not be sure that a Q-ID always represents an item on Wikidata.

Cparle added a comment.EditedApr 18 2018, 10:30 AM

From a strictly searching perspective I can't see that it matters

Say for argument's sake that on commons we allow statements with the wikidata property 'depicts' P180 (contains a wikidata Q-ID) and 'MoMA artwork id' P2014 (contains an external id). We whitelist these for search indexing (via $wgWBRepoSettings['searchIndexProperties']) and they get written into the search index in the statement_keywords field, for example

"statement_keywords": [
    "wikidata:P180=wikidata:Q527"
    "wikidata:P2014=79802"
]

(the 'wikidata' suffix comes from federation, and Q527 means 'sky')

So if I want to find files that depict 'sky', I run a search with haswbstatement:wikidata:P180=wikidata:Q527. If I want to find the file with MoMA artwork id 79802, I run a search with haswbstatement:wikidata:P2014=79802.

Change 427407 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/CirrusSearch@master] Add haswbstatement feature

https://gerrit.wikimedia.org/r/427407

Hmm I think @Cparle is right, for search it doesn't matter whether it's string or Q-id.

Change 427407 abandoned by Cparle:
Add haswbstatement feature

Reason:
Patch in the wrong repo

https://gerrit.wikimedia.org/r/427407

Change 427682 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/Wikibase@master] Add haswbstatement CirrusSearch query feature

https://gerrit.wikimedia.org/r/427682

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Apr 23 2018, 7:18 AM
Smalyshev renamed this task from Make File pages findable via the statement data contained in the search index to Make keyword to match Wikibase statement data contained in the search index.Apr 28 2018, 7:53 AM

Change 427682 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add haswbstatement CirrusSearch query feature

https://gerrit.wikimedia.org/r/427682

Change 430373 had a related patch set uploaded (by Cparle; owner: Cparle):
[mediawiki/extensions/Wikibase@master] Skip tests if cirrus search not present

https://gerrit.wikimedia.org/r/430373

Change 430373 abandoned by Cparle:
Skip tests if cirrus search not present

Reason:
Submitted with wrong bug id, abandoning

https://gerrit.wikimedia.org/r/430373

Smalyshev moved this task from in progress to Done on the Discovery-Search (Current work) board.
Smalyshev updated the task description. (Show Details)
Smalyshev closed this task as Resolved.May 7 2018, 7:33 PM