Page MenuHomePhabricator

[ES-M3] [TECH] Investigate feasibility of various solutions to enabling prefix search of EntitySchemas
Open, Needs TriagePublic

Description

Timboxed at 16 hrs.

In T340181 (parent task), it was concluded that a more sensible solution for searching Entity Schemas would be to enable prefix search for them through the wbsearchentities endpoint. In order to maintain a clear degree of separation between the EntitySchema and Wikibase extensions, we should asses the feasibility of this solution as opposed to the adding an additional endpoint.

Alternatives to consider:

  • Enabling prefix search for entity schema through wbsearchentities
  • Creating a REST API endpoint for this prefix search

Acceptance Criteria:

  • Feasibility of each approach is reported
  • Each of these approaches are compared with emphasis on our ranked quality attributes for the Entity Schema application (Modifiability, Analyzability, Testability, Reusability in order of priority)

Event Timeline

NB: That T327507: Investigation: Display Search Suggestion for Lexemes, Entity Schemas and Properties was about whether existing API endpoints are suitable to show search suggestions for these Entity types. It concluded that the ones for Properties and Lexemes are, and an endpoint for EntitySchema entities does not exist and thus needs to be created.

That investigation did not look into various options for how to create such an endpoint, because that would have been out of scope of the topic of the investigation. It merely notes that such an EntitySchema endpoint needs to be created, because it does not exist yet.

Further, for background context, that investigation was made with the understanding that EntitySchema were decided (T327507#8648401) to explicitly not be Entities at all. Only significantly later it turned out that the scope of that decision was misrepresented, and from the Product perspective/requirements EntitySchema are in fact Entities like Items or Properties ("Semantic Entities"). That obviously changes what kind of API endpoints make the most sense.

So now claiming that that investigation actively suggested creating a new endpoint over extending wbsearchentities, given that it didn't even compare those options and that it was based on false information of whether EntitySchemas are (Semantic) Entities, is quite misleading.

I think there is some misunderstanding here about a few things, such as the scope of each category of entity, and the expectations from us trying to asses the feasibility of this. Let's have a google meet session to clear this up.

ItamarWMDE renamed this task from [ES-M3] [TECH] Investigate feasibility of extending `wbsearchentities` from the EntitySchema extension to [ES-M3] [TECH] Investigate feasibility of various solutions to enabling prefix search of EntitySchemas.Jul 18 2023, 1:35 PM
ItamarWMDE updated the task description. (Show Details)

Task Review notes:

  • We debated the necessity of actually exploring all options, if we have a clear cut overview from product on which would be the preferred way to go. This will be clarified further with @Arian_Bozorg

Task Triage Notes:

  • As we are moving towards the REST API we will prioritize investigation that approach over an Action API module
  • This effort should be timeboxed no longer than 8hrs per indivdual

As we are moving towards the REST API we will prioritize investigation that approach over an Action API module

Given that we want a consistent API to access Semantic Entities with, this hypothetical EntitySchema Search REST API would need to depend on/align with/hook into a stable Wikibase Entity search REST API. However, as I understand it, such a search REST API is not even fully defined yet, not to mention "built" nor "stable/v1". But I see willingness to collaborate on it: https://mattermost.wikimedia.de/swe/pl/4wz94x44jj8bzka97b9d1ret9e

Story Writing Notes:

  • Make it clearer in the task that this involves creating a new REST API specifically for the EntitySchema extension, rather than relying on the Wikibase REST API for items or properties.
ItamarWMDE renamed this task from [ES-M3] [TECH] Investigate feasibility of various solutions to enabling prefix search of EntitySchemas to [SW] [ES-M3] [TECH] Investigate feasibility of various solutions to enabling prefix search of EntitySchemas.Jul 19 2023, 12:28 PM
ItamarWMDE renamed this task from [SW] [ES-M3] [TECH] Investigate feasibility of various solutions to enabling prefix search of EntitySchemas to [ES-M3] [TECH] Investigate feasibility of various solutions to enabling prefix search of EntitySchemas.Aug 1 2023, 8:48 AM

Change 949104 had a related patch set uploaded (by Hoo man; author: Hoo man):

[mediawiki/extensions/Wikibase@master] Add a "WikibaseSearchEntitiesExtraHandlers" hook

https://gerrit.wikimedia.org/r/949104

Change 949105 had a related patch set uploaded (by Hoo man; author: Hoo man):

[mediawiki/extensions/EntitySchema@master] Dummy "WikibaseSearchEntitiesExtraHandlers" handler

https://gerrit.wikimedia.org/r/949105

Feasibility for "Enabling prefix search for entity schema through wbsearchentities":

In order to achieve this, I created a new hook in Wikibase that can be used to add additional "search handlers" for additional (pseudo) entity types to wbsearchentities.

These "search handlers" then return their results as an array. Each entry in this result array in turn is an array containing both a TermSearchResult and another array with the result "entity" id, page title, page id, and (optionally) url. While this a rather awkward format, this could be made a bit nicer by creating a value class for holding all of these.

The complexity of the EntitySchema side is rather limited, as it will only need to create an implementation of this new "search handler" which will use its (to be created) internal search functionality and return the data in Wikibase's format (see above). See https://gerrit.wikimedia.org/r/949105 (this implements the search handler class in line and doesn't wire it up with an actual search, but nevertheless should give an idea of how much is needed on the ES side).

The changes in Wikibase are more convoluted. There's a huge difference between Wikibase's native EntitySearchHelper and how its results are used as API search results compared to the new facility. I tried my best to split up the functionality between code that needs an actual entity (id) to work and code that doesn't, so that most of the API result formatting code can be shared (thus not having to re-create that in the new "search handlers"). See https://gerrit.wikimedia.org/r/949104 (as per the TODOs on that change, this could certainly be polished a bit more, but I think it should give a fairly accurate picture of how SearchEntities would work in this case).

Testability: All of the changes proposed would be fairly straight forward to test using automated tests.
Analyzability:
(In Wikibase) The added complexity in Wikibase, especially in SearchEntities, will be a hindrance.
(In EntitySchema) Also the new functionality in EntitySchema, while simple, can't be made sense of entirely without looking at Wikibase's internals (SearchEntities).
Modifiability:
(In Wikibase) The additional means to add data to wbsearchentities will make it much harder to do modifications there.

I will oppose reusing wbsearchentities. Since EntitySchema can be installed without depending on any part of Wikibase software, using an endpoint defined in Wikibase is a bad idea. Instead what I propose is (1) using a dedicated search API in EntitySchema and/or (2) make the core API pluggable with custom-defined completion workflow (see T190454#4092077).

(striken 2023-08-29; see T344609#9124706)
(reinstated 2023-10-18; see T344609#9260423)

hoo removed hoo as the assignee of this task.Aug 22 2023, 2:00 PM
hoo moved this task from Doing to Todo/Backlog on the Wikidata Dev Team (Sprint-∞) board.
hoo subscribed.

Change #949104 abandoned by Hoo man:

[mediawiki/extensions/Wikibase@master] Add a "WikibaseSearchEntitiesExtraHandlers" hook

Reason:

https://gerrit.wikimedia.org/r/949104