Page MenuHomePhabricator

Add wikibase client support for searching wikidata items
Open, Stalled, LowPublic

Description

Right now, Wikibase Client provides access to searching Wikidata items, e.g. via newTermSearchInteractor(). It is used in ArticlePlaceholder and Lua clients probably use some version of it too. It may make sense to make ElasticSearch searching available via this API too, possibly by implementing TermSearchInteractor that uses ElasticSearch.

Event Timeline

Restricted Application added a project: Discovery. · View Herald TranscriptOct 5 2017, 12:33 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Oct 13 2017, 10:23 PM
Smalyshev moved this task from Next to Backlog on the User-Smalyshev board.Dec 5 2017, 7:58 PM

Problematic part here seems to be that WikibaseClient has separate configuration from WikibaseRepo, so we can not access search profiles, and without those we can't run the search. Not sure what is the right way to handle it.

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Dec 18 2017, 2:58 PM
Smalyshev changed the task status from Open to Stalled.Jan 13 2018, 1:23 AM
Smalyshev added a comment.EditedFeb 12 2018, 7:55 AM

One possible way may be to use cirrus-config-dump API (or make Cirrus somehow do it for us). But it's kinda heavyweight... OTOH, cross-wiki search uses it so maybe it's ok.
@dcausse - what do you think - would it be possible to make profile management somehow have "remote wiki" mode that would load configs from other wiki for cases like this? We already do it in sister search but probably not in a way that is reusable?

Smalyshev triaged this task as Low priority.Feb 12 2018, 8:07 AM
dcausse added a comment.EditedFeb 12 2018, 9:41 AM

This is in theory possible but the problem is that some profiles refer to some class implementations that are maybe not available on the host wiki.
So yes we could use the sister search logic with some adaptation but the may blocker will be that the builder implementation won't be available if the wikibase extension is not loaded on the host wiki.
We will have similar problems with SDoC search sooner or later since search on commons is available from all wikis. If SDoC search provides some custom implementations the code will have to be available on the host wiki.
In short, if the WikibaseClient imports the builder classes then it's probably fine, if not I think it'll be hard to do it.

EDIT: another problem is that cirrus-dump-config exports Cirrus config, but not all the profile information is stored as wgCirrusSearch* vars, we would have to adapt cirrus-dump-config so that it can export a state of the search profiles that can be reloaded on the host wiki.

hoo added a comment.Apr 11 2018, 1:49 PM

@Smalyshev What's the status here? Say we want to get rid of the wb_terms table…

@hoo I am still not sure what would be a good way to get search configs to the client... Maybe extension to cirrus-config-dump API is needed. It is also confounded by the fact that Cirrus and Wikibase use different configs (which feed from the same globals, normally, but they do not cooperate in any way AFAIK) which makes it kind of hard to inject stuff. How urgent is this? If it's important near term, I can allocate specific time to work on it closely and find solution, otherwise I'll think about it and get to it a bit later.

Smalyshev moved this task from Backlog to Waiting/Blocked on the User-Smalyshev board.
Smalyshev updated the task description. (Show Details)May 24 2018, 8:13 PM

So I thought about it a bit more and looks like we don't really need to bring search configs from repo - we can have a set of fixed config that are enough for simple straightforward match on client, and have them baked into client, and use that instead of repo ones.

Does it mean that we would make WikbaseClient dependent on CirrusSearch and create all necessary query builders into this client?
Have we considered the possibility to run an actual API call to wbsearchentities@wikidata.org?
I have no clue if the current API output would allow to rebuild TermSearchResult nor if there are perf considerations that make this solution impossible.

Does it mean that we would make WikbaseClient dependent on CirrusSearch

Well, ideally after T190022: Separate the CirrusSearch/Elastic-specific code from Wikibase code base it all will be in WikibaseCirrusSearch extension I presume.

and create all necessary query builders into this client?

Yes, that's the idea.

Have we considered the possibility to run an actual API call to wbsearchentities@wikidata.org

I thought about it but it looks rather serious performance hit (going back through all caching infrastructure, getting all the request init overhead again and then parsing the results). And I understand the main motivation here is performance. If we have page with Lua that requests 20 lookups, having 20 sub-requests may be a bit too much.

It also feels a bit wrong to go whole roundtrip when we have most classes and configs sitting right here.

I have no clue if the current API output would allow to rebuild TermSearchResult

Probably but I am not convinced we should do it. I am right now leaning to the side of we shouldn't.

This ticket conflates tow very different things, which makes it difficult to discuss tradeoffs:
#1 looking up properties by label (PropertyIdResolver)
#2 interactively searching for items based on some search input

For #1 performance is an issue, and API calls are a no-go, since they would have to happen during parsing, and we may be doing dozens or even hundreds of them per page.
For #2, API calls would be fine, we have much more time, and only ever one search per request.

The two use cases also need very different search profiles. I suggest to discuss them in separate tickets.

I am not sure how looking up properties by label is different from looking up items by label. Am I missing something here? Are only properties but not items allowed to be looked up by label? I feel like I am missing some context here.

Smalyshev moved this task from Waiting/Blocked to Next on the User-Smalyshev board.Jun 1 2018, 7:18 PM
Smalyshev moved this task from Next to Backlog on the User-Smalyshev board.Jun 15 2018, 11:05 PM
Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Jul 27 2018, 7:28 PM
hoo added a subscriber: Lucie.Aug 14 2018, 5:38 PM
Smalyshev moved this task from Next to Doing on the User-Smalyshev board.Aug 14 2018, 8:09 PM

OK, so #1 is basically T194143: Make PropertyLabelResolver that uses ElasticSearch. So I think it should be discussed here.

Which leaves us with #2, which is implementing TermSearchInteractor that can do ElasticSearch. For this, we need to identify the use cases for it. I'll look for them and update the task description accordingly.

Smalyshev moved this task from Doing to Next on the User-Smalyshev board.Aug 20 2018, 5:00 PM

@Smalyshev this is not the priority right now. We'll try some other approach for ArticlePlaceholder later in Fall. For Lua we'd also be thinking options.
So this remains stalled. We'll get back to you guys, if we decide to pursue with Elastic search. But this not going to happen in the next weeks certainly.

Smalyshev moved this task from Next to Backlog on the User-Smalyshev board.Sep 10 2018, 8:57 PM