Page MenuHomePhabricator

Separate the CirrusSearch/Elastic-specific code from Wikibase code base
Closed, ResolvedPublic

Description

It does not seem desired to have knowledge on CirrusSearch/Elastic encoded in Wikibase. It definitely is valuable to allow Elastic/Cirrus-powered search in Wikibase, but Wikibase itself should not be depending on CirrusSearch, or Elastic in any way. Similarly, CirrusSearch extension should not know that Wikibase even exists, as one of its possible users, but still both CirrusSearch and Wikibase would enable a powerful application.

Having such separation would allow more efficient development of both Wikibase, and CirrusSearch. Having clear boundaries would also improve the work on Cirrus-powered searched for Wikibase/Wikidata, as in my opinion, it would provide the area within which this work can be done, without a need to adjust Wikibase and/or CirrusSearch for the need of this specific topic (I mean that both CirrusSearch and Wikibase do far more than just Cirrus-based search in Wikibase).

I suggest the following way to make the separation: the CirrusSearch/Elastcic-specific code would be moved to a separate extension (also meaning it would be moved to its dedicated code repository). This extension would be enabled on wikis that use Wikibase, and the Cirrus-based search (e.g. wikidata.org). The extension would be owned by WMF's Search Platform team.

In practice, the separation would mean:

  • moving code from the Wikibase\Repo\Search\Elastic namespace to the new extension
  • also moving the code handling Cirrus-specific hooks out of Wikibase
  • moving Cirrus-specific config files, and options out
  • abstracting the way different search engines could be used by Wikibase search API (e.g. turn the useCirrus option into some more generic hook that extension could use to override the engine used in search)
  • do not depend on/use FieldDefintions object in Wikibase's EntityHandlers etc. I have no immediate suggestion at hand yet, looking at this for brief two minutes I see several options to consider.

Migration plan:

  1. Transfer Cirrus-dependent code to WikibaseCirrusSearch extension, using Wikibase\Search\Elastic namespace (note removed Repo).
  2. Make a switch that turns on/off all search-related hooks & configs for both extensions. Default values on for Wikibase, off for WikibaseCirrusSearch
  3. This is the point where we should stop accepting any patches to Cirrus code in Wikibase, and re-sync all patches that happened since the start with WikibaseCirrusSearch.
  4. After testing, flip the config above to off for Wikibase, on for WikibaseCirrusSearch
  5. Drop the Cirrus hooks code from Wikibase
  6. Drop all Cirrus code from Wikibase

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 486708 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Move basic fields definitions to Wikibase space

https://gerrit.wikimedia.org/r/486708

Change 486714 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseLexeme@master] Move field definitions to new class name

https://gerrit.wikimedia.org/r/486714

Change 486717 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseMediaInfo@master] Move field definitions to new class name

https://gerrit.wikimedia.org/r/486717

@Lydia_Pintscher @WMDE-leszek would be glad to hear your input on this and on the migration plan above.

Sorry for the super late reply. I have no preference among these and Leszek is on vacation.

OK, I am dealing with main Wikibase code so far, but would appreciate team's feedback as it would be time to move on to the extensions soon. Please tag whoever else on the team may be interested.
So far I understand the following use Elastic functionality:

  • WikibaseLexeme
  • WikibaseMediaInfo

Any others?

Change 486556 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Introduce a switch to turn off CirrusSearch functionality

https://gerrit.wikimedia.org/r/486556

We will also need to move options out. We could move it to set of WikibaseCirrusSearch options, or keep current option structure, in which case we'd need to enable extensions to define their own Wikibase options, which as I understand not the case now? Should we keep these options in the same place (BC) or create a new set of options for WikibaseCirrusSearch?

Change 486708 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Move basic fields definitions to Wikibase space

https://gerrit.wikimedia.org/r/486708

Change 447954 merged by jenkins-bot:
[mediawiki/extensions/WikibaseCirrusSearch@master] Initial code setup

https://gerrit.wikimedia.org/r/447954

Change 486717 merged by jenkins-bot:
[mediawiki/extensions/WikibaseMediaInfo@master] Move field definitions to new class name

https://gerrit.wikimedia.org/r/486717

Change 486714 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Move field definitions to new class name

https://gerrit.wikimedia.org/r/486714

Change 486707 abandoned by Smalyshev:
Introduce switch that enables/disables this extension functions

Reason:
already implemented

https://gerrit.wikimedia.org/r/486707

Change 488671 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseCirrusSearch@master] Move search configs to this extension

https://gerrit.wikimedia.org/r/488671

Change 489362 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/tools/release@master] Add WikibaseCirrusSearch extension

https://gerrit.wikimedia.org/r/489362

Apologies @Smalyshev for super late reply. We've got snowed under with some urgent Structured Data on Commons work.

Regarding migration plan: I think having Wikibase ,WikibaseLexeme, WikibaseMediaInfo separated is indeed cleaner, and would prefer that path. That means more extensions indeed, but also would allow for better separation, and also possible easier development of separate use cases.
I believe starting with Wikibase first, and leaving WikibaseLexeme and WikibaseMediaInfo as they're at the moment is a decent strategy. It'd allow to focus on one task, reduce the overall scope and amount of effort. Migrating remaining extensions would there be also easier given the experience gather in the first step I reckon.

Regarding options: I'd believe moving options WikibaseCirrusSearch makes sense. Having B/C "duplicates" in Wikibase for the migration/transition period is of course absolutely fine.

Thanks for all the effort Stas! Could we be of any help, please let us know. I'll try to be more available now given the SDoC seems to be on almost on the finishing line.

Jony added a subscriber: Jony.Feb 15 2019, 5:08 PM

Change 488671 merged by jenkins-bot:
[mediawiki/extensions/WikibaseCirrusSearch@master] Move search configs to this extension

https://gerrit.wikimedia.org/r/488671

Change 495066 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Allow inheriting some functions, for hit displayer in WikibaseCirrusSearch

https://gerrit.wikimedia.org/r/495066

Change 495047 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Migrate field definitions to a new class

https://gerrit.wikimedia.org/r/495047

Change 495068 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseCirrusSearch@master] Add implementation of search hit handlers

https://gerrit.wikimedia.org/r/495068

Change 495047 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Migrate field definitions to a new class

https://gerrit.wikimedia.org/r/495047

Change 495156 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Disable ShowHit functionality when disableCirrus is active

https://gerrit.wikimedia.org/r/495156

Change 495163 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/WikibaseCirrusSearch@master] Port integration test from Wikibase

https://gerrit.wikimedia.org/r/495163

Change 495066 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Allow accessing some static functions, for hit displayer in WikibaseCirrusSearch

https://gerrit.wikimedia.org/r/495066

Change 495068 merged by jenkins-bot:
[mediawiki/extensions/WikibaseCirrusSearch@master] Add implementation of search hit handlers

https://gerrit.wikimedia.org/r/495068

Change 495156 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Disable ShowHit functionality when disableCirrus is active

https://gerrit.wikimedia.org/r/495156

Change 495163 merged by jenkins-bot:
[mediawiki/extensions/WikibaseCirrusSearch@master] Port integration test from Wikibase

https://gerrit.wikimedia.org/r/495163

Change 496073 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[mediawiki/extensions/Wikibase@master] Add interface that would allow to skip applying hooks to certain results

https://gerrit.wikimedia.org/r/496073

Change 496073 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add interface that would allow to skip applying hooks to certain results

https://gerrit.wikimedia.org/r/496073

Smalyshev moved this task from Doing to Done on the User-Smalyshev board.
debt closed this task as Resolved.Apr 19 2019, 9:03 PM