Add wikibase client support for searching wikidata items
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	Smalyshev
	Oct 5 2017, 12:33 AM

Description

Right now, Wikibase Client provides access to searching Wikidata items, e.g. via newTermSearchInteractor(). It is used in ArticlePlaceholder and Lua clients probably use some version of it too. It may make sense to make ElasticSearch searching available via this API too, possibly by implementing TermSearchInteractor that uses ElasticSearch.

Related Objects

Mentioned In: T194143: Make PropertyLabelResolver that uses ElasticSearch
T86530: Replace wb_terms table with more specialized mechanisms for terms (tracking)
Mentioned Here: T194143: Make PropertyLabelResolver that uses ElasticSearch
T190022: Separate the CirrusSearch/Elastic-specific code from Wikibase code base

Event Timeline

Smalyshev created this task.Oct 5 2017, 12:33 AM

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptOct 5 2017, 12:33 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Smalyshev added subscribers: daniel, Addshore, hoo.Oct 5 2017, 12:35 AM

Smalyshev added a project: User-Smalyshev.Oct 10 2017, 9:12 PM

Smalyshev moved this task from needs triage to This Quarter on the Discovery-Search board.Oct 12 2017, 5:45 PM

Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Oct 13 2017, 10:23 PM

Smalyshev moved this task from Next to Backlog on the User-Smalyshev board.Dec 5 2017, 7:58 PM

Problematic part here seems to be that WikibaseClient has separate configuration from WikibaseRepo, so we can not access search profiles, and without those we can't run the search. Not sure what is the right way to handle it.

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Dec 18 2017, 2:58 PM

Smalyshev changed the task status from Open to Stalled.Jan 13 2018, 1:23 AM

Smalyshev removed a project: User-Smalyshev.Jan 22 2018, 6:49 PM

One possible way may be to use cirrus-config-dump API (or make Cirrus somehow do it for us). But it's kinda heavyweight... OTOH, cross-wiki search uses it so maybe it's ok.
@dcausse - what do you think - would it be possible to make profile management somehow have "remote wiki" mode that would load configs from other wiki for cases like this? We already do it in sister search but probably not in a way that is reusable?

Smalyshev added a subscriber: dcausse.Feb 12 2018, 7:57 AM

Smalyshev triaged this task as Low priority.Feb 12 2018, 8:07 AM

This is in theory possible but the problem is that some profiles refer to some class implementations that are maybe not available on the host wiki.
So yes we could use the sister search logic with some adaptation but the may blocker will be that the builder implementation won't be available if the wikibase extension is not loaded on the host wiki.
We will have similar problems with SDoC search sooner or later since search on commons is available from all wikis. If SDoC search provides some custom implementations the code will have to be available on the host wiki.
In short, if the WikibaseClient imports the builder classes then it's probably fine, if not I think it'll be hard to do it.

EDIT: another problem is that cirrus-dump-config exports Cirrus config, but not all the profile information is stored as wgCirrusSearch* vars, we would have to adapt cirrus-dump-config so that it can export a state of the search profiles that can be reloaded on the host wiki.

@Smalyshev What's the status here? Say we want to get rid of the wb_terms table…

@hoo I am still not sure what would be a good way to get search configs to the client... Maybe extension to cirrus-config-dump API is needed. It is also confounded by the fact that Cirrus and Wikibase use different configs (which feed from the same globals, normally, but they do not cooperate in any way AFAIK) which makes it kind of hard to inject stuff. How urgent is this? If it's important near term, I can allocate specific time to work on it closely and find solution, otherwise I'll think about it and get to it a bit later.

hoo mentioned this in T86530: Replace wb_terms table with more specialized mechanisms for terms (tracking).May 8 2018, 1:42 PM

Smalyshev added a project: User-Smalyshev.May 24 2018, 8:06 PM

Smalyshev moved this task from Backlog to Waiting/Blocked on the User-Smalyshev board.

Smalyshev updated the task description. (Show Details)May 24 2018, 8:13 PM

Smalyshev mentioned this in T194143: Make PropertyLabelResolver that uses ElasticSearch.May 24 2018, 8:16 PM

So I thought about it a bit more and looks like we don't really need to bring search configs from repo - we can have a set of fixed config that are enough for simple straightforward match on client, and have them baked into client, and use that instead of repo ones.

Does it mean that we would make WikbaseClient dependent on CirrusSearch and create all necessary query builders into this client?
Have we considered the possibility to run an actual API call to wbsearchentities@wikidata.org?
I have no clue if the current API output would allow to rebuild TermSearchResult nor if there are perf considerations that make this solution impossible.

Does it mean that we would make WikbaseClient dependent on CirrusSearch

Well, ideally after T190022: Separate the CirrusSearch/Elastic-specific code from Wikibase code base it all will be in WikibaseCirrusSearch extension I presume.

and create all necessary query builders into this client?

Yes, that's the idea.

Have we considered the possibility to run an actual API call to wbsearchentities@wikidata.org

I thought about it but it looks rather serious performance hit (going back through all caching infrastructure, getting all the request init overhead again and then parsing the results). And I understand the main motivation here is performance. If we have page with Lua that requests 20 lookups, having 20 sub-requests may be a bit too much.

It also feels a bit wrong to go whole roundtrip when we have most classes and configs sitting right here.

I have no clue if the current API output would allow to rebuild TermSearchResult

Probably but I am not convinced we should do it. I am right now leaning to the side of we shouldn't.

This ticket conflates tow very different things, which makes it difficult to discuss tradeoffs:
#1 looking up properties by label (PropertyIdResolver)
#2 interactively searching for items based on some search input

For #1 performance is an issue, and API calls are a no-go, since they would have to happen during parsing, and we may be doing dozens or even hundreds of them per page.
For #2, API calls would be fine, we have much more time, and only ever one search per request.

The two use cases also need very different search profiles. I suggest to discuss them in separate tickets.

I am not sure how looking up properties by label is different from looking up items by label. Am I missing something here? Are only properties but not items allowed to be looked up by label? I feel like I am missing some context here.

Smalyshev moved this task from Waiting/Blocked to Next on the User-Smalyshev board.Jun 1 2018, 7:18 PM

Smalyshev moved this task from Next to Backlog on the User-Smalyshev board.Jun 15 2018, 11:05 PM

Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Jul 27 2018, 7:28 PM

hoo added a subscriber: Lucie.Aug 14 2018, 5:38 PM

Smalyshev moved this task from This Quarter to Current work on the Discovery-Search board.Aug 14 2018, 6:00 PM

Smalyshev edited projects, added Discovery-Search (Current work); removed Discovery-Search.

Smalyshev moved this task from Next to Doing on the User-Smalyshev board.Aug 14 2018, 8:09 PM

OK, so #1 is basically T194143: Make PropertyLabelResolver that uses ElasticSearch. So I think it should be discussed here.

Which leaves us with #2, which is implementing TermSearchInteractor that can do ElasticSearch. For this, we need to identify the use cases for it. I'll look for them and update the task description accordingly.

Smalyshev moved this task from Doing to Next on the User-Smalyshev board.Aug 20 2018, 5:00 PM

@Smalyshev this is not the priority right now. We'll try some other approach for ArticlePlaceholder later in Fall. For Lua we'd also be thinking options.
So this remains stalled. We'll get back to you guys, if we decide to pursue with Elastic search. But this not going to happen in the next weeks certainly.

Smalyshev moved this task from Next to Backlog on the User-Smalyshev board.Sep 10 2018, 8:57 PM

Smalyshev edited projects, added Discovery-Search; removed Discovery-Search (Current work).Sep 27 2018, 6:06 AM

Smalyshev moved this task from needs triage to search-icebox on the Discovery-Search board.

Smalyshev removed a project: User-Smalyshev.Sep 27 2018, 6:11 AM

Smalyshev moved this task from search-icebox to Wikibase Search on the Discovery-Search board.Jan 29 2019, 7:13 PM

• Mholloway subscribed.Apr 5 2019, 8:27 PM

The previous comments don't explain who or what (task?) exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status, as tasks should not be stalled (and then potentially forgotten) for years for unclear reasons.

(Smallprint, as general orientation for task management:
If you wanted to express that nobody is currently working on this task, then the assignee should be removed and/or priority could be lowered instead.
If work on this task is blocked by another task, then that other task should be added via Edit Related Tasks... → Edit Subtasks.
If this task is stalled on an upstream project, then the Upstream tag should be added.
If this task requires info from the task reporter, then there should be instructions which info is needed.
If this task needs retesting, then the TestMe tag should be added.
If this task is out of scope and nobody should ever work on this, or nobody else managed to reproduce the situation described here, then it should have the "Declined" status.
If the task is valid but should not appear on some team's workboard, then the team project tag should be removed while the task has another active project tag.)

Closing as invalid as I struggle to see the context and figure out what / why now.
It should also be noted that many things in the area have changed since 2017

Add wikibase client support for searching wikidata itemsClosed, InvalidPublicActions

Description

Related Objects

Event Timeline

Add wikibase client support for searching wikidata items
Closed, InvalidPublic
Actions