Page MenuHomePhabricator

Make PropertyLabelResolver that uses ElasticSearch
Closed, DeclinedPublic

Description

Currently the only PropertyLabelResolver implementation is TermPropertyLabelResolver, which uses SQL database to look up properties. We should use have PropertyLabelResolver implementation that uses CirrusSearch and ElasticSearch for searching/selecting properties, if available.

Note: This service is used from Lua/ parser functions, so fast response times are needed. If we can't move this to Cirrus, we can surely find another solution.

Event Timeline

I am not sure TermPropertyLabelResolver per se should use Cirrus (since it gets TermIndex as parameter) but we certainly can have PropertyLabelResolver that uses Cirrus. Unless of course we make TermIndex implementation that uses Cirrus? That may be possible too I imagine.

See however T177453: Add wikibase client support for searching wikidata items - right now we might have an issue using Cirrus for Wikibase from WikibaseClient. I think if we want to use this for Lua we have to implement it?

Vvjjkkii renamed this task from Make "TermPropertyLabelResolver" use Cirrus to gedaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii triaged this task as High priority.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot renamed this task from gedaaaaaaa to Make "TermPropertyLabelResolver" use Cirrus.Jul 1 2018, 3:39 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: Aklapper.
Addshore subscribed.

The campsite isn't going to immediately work on this, this should probably be decided by Wikidata-Ugly-Cat-Trailblaze (wb_terms trail blazing)

Smalyshev renamed this task from Make "TermPropertyLabelResolver" use Cirrus to Make PropertyLabelResolver that uses ElasticSearch.Aug 15 2018, 12:35 AM
Smalyshev updated the task description. (Show Details)

Looking at TermPropertyLabelResolver, I see that it loads all properties into memory and caches it (for all languages). Should we keep doing this for ElasticSearch? Should we have separate caches for each language? Note that right now we cache full PropertyId object for every label in every language, which may not be the most efficient way of doing it. OTOH, since there are only about 5000 properties now, maybe it doesn't matter too much.

Also, do we want:

  1. Keep segregation between languages? I.e. eswiki won't match "instance of" because it's "instancia de" in Spanish, and vice versa?
  2. Only exact match or some leeway in how the label is matched (i.e. case folding, normalization, etc.)? This may make caching much harder though.
WMDE-leszek subscribed.

We do not need this at this moment. I guess that means declined. We might re-open, or create a new ticket, when we need implementation like this.