Page MenuHomePhabricator

Implement searching of 'depicts' on commons with the 'inscription' qualifier
Open, LowPublic

Description

The inscription property refers to ‘inscriptions, markings and signatures on an object' and is of type ‘monolingual text’

When used as a qualifier with 'depicts' it refers to markings on the thing-that-is-depicted - example from wikidata https://www.wikidata.org/wiki/Q1136099 (see depicts > Province of New York > inscription, etc.)

The example I've been using to model this conceptually is band t-shirts - an image could depict a t-shirt with 'The Rolling Stones' written on it, and a user might want to find all images containing pictures of Rolling Stones t-shirts

I can think of 3 different ways of implementing this, all with drawbacks/tradeoffs

Option 1
We can store the qualifier in the normal way like this P180=Q131151[P1682=The Rolling Stones] (see T193407), in which case we would only be able to find exact matches. For example searching for haswbstatement:P180=Q131151[P1682=Stones] won't match P180=Q131151[P1682=The Rolling Stones].

Option 1a
Perhaps we could pass a regex to the haswbstatement keyword? Would require changes to the mapping of the statement_keywords field

Option 2
Implement a specific elasticsearch solution just for this qualifier - for example we could store the inscription in a fulltext field, which would mean a partial match would work. It'd be tricky to do, because we'd need to treat one qualifier differently to all the others both when we were indexing and when we're searching. Also if we did it this way I'm not sure how to store the fact that the inscription relates to a particular 'depicts' tag (or even if that'd be possible) - so someone could try and search for pictures containing Rolling Stones t-shirts and some of their results would contains blank t-shirts plus some other object with the text 'The Rolling Stones' inscribed on it.

Option 3
Another possible approach is to use the Wikidata Query Service (WDQS) to run a SPARQL query, and then use the ids as a filter for an elasticsearch query - basically we'd ask WDQS for all pictures depicting a t-shirt inscribed with 'The Rolling Stones', take all the resulting IDs, and then search elasticsearch for anything else we wanted to search for but only among the (max 1000) IDs we got from WDQS.

Note that this option depends on T194401


Option 1 is easiest, but only does exact matches unless the regex idea (option 1a) works, which might be difficult to implement on the frontend in a user-friendly way
Option 2 is tricky to implement, and may return some incorrect data, but would probably be more performant than option 3
Option 3 is in-between, implementation-wise. Probably the slowest to run, results will be more accurate than option 2 but because of limitations passing data between WDQS and elasticsearch there will be edge cases where no results will be returned even if appropriate results exist.. This option depends on T194401


Wikidata currently contains 7 items with depicts statements that have inscription qualifiers out of a total of ~70k items with depicts statements (~0.01%)

Related Objects

Event Timeline

Cparle triaged this task as Medium priority.May 8 2018, 5:05 PM
Cparle created this task.
Cparle updated the task description. (Show Details)
Vvjjkkii renamed this task from Implement searching of 'depicts' on commons with the 'inscription' qualifier to addaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii removed Cparle as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from addaaaaaaa to Implement searching of 'depicts' on commons with the 'inscription' qualifier.Jul 2 2018, 6:14 AM
CommunityTechBot assigned this task to Cparle.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Cparle lowered the priority of this task from Medium to Low.Nov 15 2018, 5:17 PM
MPhamWMF subscribed.

Closing out low/est priority tasks over 6 months old with no activity within last 6 months in order to clean out the backlog of tickets we will not be addressing in the near term. Please feel free to reopen if you think a ticket is important, but bare in mind that given current priorities and resourcing, it is unlikely for the Search team to pick up these tasks for the indefinite future. We hope that the requested changes have either been addressed by or made irrelevant by work the team has done or is doing -- e.g. upgrading Elasticsearch to a newer version will solve various ES-related problems -- or will be subsumed by future work in a more generalized way.

RhinosF1 removed a project: Discovery-Search.
RhinosF1 subscribed.

Re-opening tasks and removing from team workboard per IRC feedback given yesterday and discussion with MPham.