New Vector Search is not Wikidata aware
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Jdlrobson
	Feb 19 2021, 10:14 PM

Referenced Files

	F34644682: image.png
	Sep 16 2021, 9:24 AM

	F34644686: image.png
	Sep 16 2021, 9:24 AM

	F34644680: image.png
	Sep 16 2021, 9:24 AM

	F34191649: Bildschirmfoto von 2021-03-26 11-32-54.png
	Mar 26 2021, 10:39 AM

	F34172458: image.png
	Mar 19 2021, 7:33 AM

	F34166151: Screen Shot 2021-03-17 at 9.23.30 AM.png
	Mar 17 2021, 4:23 PM

	F34165916: image.png
	Mar 17 2021, 1:25 PM

	F34137037: image.png
	Mar 4 2021, 7:07 PM

Subscribers

View All 27 Subscribers

Description

The WVUI search widget used in the new Vector skin is not yet providing Wikidata the same functionality as the search box in the legacy Vector skin. There are multiple functional requirements that are not yet met:

Functional requirements

Related to the API queried for results:

1. The search must search through Labels and Aliases in any language and Entity-Ids

In the current search in the legacy Vector skin, the search box uses the action=wbsearchentities API endpoint. We need to write some new adapter code for the new Search box in the new Vector skin that does that as well.
The API used by the search box in the new Vector skin currently searches only page titles (Q-IDs in case of Wikidata) which is not helpful.
Note: Descriptions are intentionally not searched by the action=wbsearchentities API.

Related to how the results are displayed to the user:

1.: Show matches outside the Label in the current language

On Wikidata, the search goes not only through the Entities' Labels in the current Language, but also through their Aliases, and all the Labels and Aliases in all other languages. If any of these match, then that must also be shown in the search result. Also, if one searches for an Entity-ID directly (e.g., "Q42"), then that matching Entity must be shown as well.

2.: Handle multiple languages

Each "text object" (i.e. title, description, alias/search match)'s language should be explicitly set in a HTML lang="" attribute. This is because the language can be different due to language fallbacks and this change allows screen readers to function. We will also need to account for the possibility of different writing directions.

3.: Allow for loading more results in an obvious way

The current WVUI TypeaheadSearch component seems to limit the number of results being displayed in the menu to 10 (probably configurable). Wikidata provides a high amount of matching results per search, and one often doesn't usually find what one is looking for in the first couple of suggestions.
Therefore, we require users to be able to (maybe implicitly) request further results. The solution that we find must be obvious to users and the pattern should be the same across Wikidata (e.g. property lookup).

Possible solutions for 3.:

3.1. We provide users with an option (e.g., a “more”-pseudoresult) that they can click to load more results within the results menu.

3.2. We allow users to scroll within the dropdown results menu and load more results in the background on scroll

Original task description:

The existing search API only works with queries containing "Q" and returns results without the correct display title
https://wikidata.org/w/rest.php/v1/search/title?q=Q3&limit=10

This means in future Wikidata will become useless with the roll out of the latest version of Vector and will stall further adoption efforts of the wikimedia wvui library.

Details

	Subject	Repo	Branch	Lines +/-
	[Do not merge!][PoC] Make new Vector search Wikidata aware	mediawiki/extensions/Wikibase	master	+78 -0
	Search: Drop Wikidata specific behaviour	mediawiki/skins/Vector	master	+3 -5

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		Jdlrobson	T195473 [GOAL] Invest in the MobileFrontend & MinervaNeue frontend architecture
Resolved		Jdlrobson	T195478 [EPIC] Speed up unit test execution and increase code coverage
Resolved		Jdlrobson	T195482 [EPIC] Review and refactor MobileFrontend components used by Minerva
Open		None	T281930 [EPIC] Migrate MobileFrontend's code to Vue.js and Codex
Declined		None	T212465 [EPIC] None of our View's should exhibit 2 levels of inheritance
Declined	BUG REPORT	Jdlrobson	T289084 MinervaNeue shows 2 search elements when JS is disabled
Open		None	T282473 [GOAL] Use the Codex search widget inside the mobile site
Resolved		Lucas_Werkmeister_WMDE	T281318 [EPIC] Enable new Codex Typeahead search on Wikidata.org
Resolved		Jdlrobson	T275251 New Vector Search is not Wikidata aware
Resolved		Lucas_Werkmeister_WMDE	T316093 Make new Vector search use wbsearchentities on Wikidata
Duplicate		None	T317681 Make new Vector search navigate to item search results on Wikidata
Resolved		Lucas_Werkmeister_WMDE	T317682 Make new Vector search navigate to search result URL when selecting search result using keyboard
Resolved		Michael	T322333 Modify new Vector Search to allow loading more results on Wikidata
Resolved		Michael	T326633 Monitor the deployment of the new Search on the 2022 version of the Vector skin

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Jdlrobson added a parent task: T282473: [GOAL] Use the Codex search widget inside the mobile site.May 10 2021, 5:59 PM

despens subscribed.May 26 2021, 9:02 AM

despens unsubscribed.

despens subscribed.

R4356th subscribed.Jun 30 2021, 10:00 AM

Addshore added a project: Wikibase Release Strategy.Jun 30 2021, 3:02 PM

Addshore moved this task from Research to Investigate & Discuss on the [DEPRECATED] wdwb-tech board.Jul 16 2021, 8:39 AM

Jdlrobson mentioned this in T287215: Enable WVUI search on commons .Jul 22 2021, 9:54 PM

Jdlrobson mentioned this in T285223: Show item description in the Wikidata search results in MinervaNeue skin.Jul 28 2021, 9:09 PM

Has any progress been made with this API?

We (the web team) will be beginning the process for porting the mobile site to this component and without this API we will need to consider one of two options, neither of which is great:

disabling the JavaScript enhancement for search on Wikidata.

Moving the frontend code to a Wikidata extension as technical debt to be tackled at a later date.

ovasileva moved this task from Incoming to Search on the Desktop Improvements (Vector 2022) board.Aug 31 2021, 8:39 PM

Regarding the ranking discussion, I would've thought you're more likely to want a Lexeme if you're on a page about a Lexeme already, and so it should be prioritized in the results list (but not to the extent of excluding other results).

That should perhaps be split off into a separate task, though. The priority here seems to be "make a hook that can return a search engine for results".

From a user perspective, I think many consumers would actually want Wikidata's "identifier matched somewhere" as a result. But I appreciate that it might break things if results are returned that do not correspond to title prefix matches.

If the search title contract is not to be altered, maybe the change needed is a hook for the endpoint used to make queries, that the UI code uses, as well as an implementation of such an endpoint? Maybe Wikibase could hook the search content API and add data to it, and the UI component could be directed to use that, rather than creating a whole new endpoint? Conceptually the content being matched is similar to page content.

@Jdlrobson Hello! new EM for wd here 👋

By when would you need a response on what to do in order to not jeopardize the porting process?

Hi @karapayneWMDE later today, I'm making a config change which will mean Wikidata is the only Wikimedia project that does not use the Vue search.

I think the timeline is dependent on you.

Ideally, I would like to drop support for the old search now, which would mean that Wikidata for the modern Vector skin would have no autocomplete functionality. This might be okay, as Wikidata currently doesn't have the new Vector set as the default skin, but I want to double-check that with you. If you want more time, I can offer 3 months maximum at this point, after which I'd feel uncomfortable maintaining the old search with the new skin.

According to https://www.mediawiki.org/wiki/Reading/Web/Desktop_Improvements, we are looking to make the new skin the default on all wikis by the end of the year, so ideally we need to make the search compatible with Wikidata before the end of the year.

The main blocker for this is having a functioning search API that returns results for queries such as https://wikidata.org/w/rest.php/v1/search/title?q=Q3&limit=10. It's also acceptable for Wikidata to provide its own API if that is preferred - the search API is configurable and we can point to any service you want to, provided the response is consistent with that APIs specification.

Let me know if you want to chat through this in a video call with a WMDE Engineer

• Manuel subscribed.Sep 2 2021, 8:22 AM

Jdlrobson mentioned this in T289724: Sticky header: add search to sticky header.Sep 9 2021, 7:03 PM

@karapayneWMDE I've setup T290688 with the proposed next step in case you want to test the implications on Wikidata for the modern Vector skin. I'd like to either merge this within the next 2 weeks, or at the latest December 1st, if 3 months is a reasonable timeline to address this issue. Please let me know your preference.

Hi @Jdlrobson

So initially highlighting what is currently displayed in our results

Label of matched entity with fallback in user language
Matched string, which can include fallback, would be one of label, alias, or Qid (item ID)
Description with fallback of the entity

The main blocker for this is having a functioning search API that returns results for queries such as https://wikidata.org/w/rest.php/v1/search/title?q=Q3&limit=10. It's also acceptable for Wikidata to provide its own API if that is preferred - the search API is configurable and we can point to any service you want to, provided the response is consistent with that APIs specification.

So we do not want to confuse things by adding entity search to rest.php/v1/search/title that is specifically meant to search for titles, the functionality there should remain the same and just be provided by MediaWiki, as also expressed by daniel in T275251#6944581, this would break the contract.
So we would want to provide a separate API (which you say is fine)
However we don't really want to have to provide an API response that is again confusing if people were to ever look at it (with title etc).
Is there some way we could provide some sort of API adapter or a second format that this JS code can deal with?

Taking inspiration from https://www.mediawiki.org/wiki/API:REST_API/Reference#Search_result_object
We would want to return:

Page title, or just URL to link to for the result. Thinking ahead for possible future cases we may want to consider, URL would be preferred
Display line 1 - Which in our case would be something like LABEL (MATCH) where LABEL is a fall back enabled label for user display and MATCH is the string that actually matched, if different from that label
Display line 2 - This would be our description with language fallback
Image url - We wouldn't provide this initially, but would want to in the future, and would ideally like to hide the image doesn't exist part of the result

But we don't want to introduce a confusing situation of miss using the API, or format.
Perhaps this is even something that should change in the main REST spec?

The other key part of the experience that we don't want to loose is the More bar.

Clicking this expands the existing results without navigating to another page

So some concrete questions:

Can we provide a different API format, that is more generic for a search usecase, rather than one that it tightly bound to MediaWiki concepts? (this would probably need some changes in JS, but probably small ones?)
Is there a way to add functionality to the search to allow this More expansion bar?

So we would want to provide a separate API (which you say is fine)

Yep new API is fine and the path of least resistance,

So we do not want to confuse things by adding entity search to rest.php/v1/search/title that is specifically meant to search for titles

We'd need to make some tweaks to the WVUI and Vector (legacy skin) to allow configuration of the API path. We have wgVectorSearchHost for changing the host, so I'd take care of things that side making sure this configuration evolves to include path.

Can we provide a different API format, that is more generic for a search usecase, rather than one that it tightly bound to MediaWiki concepts? (this would probably need some changes in JS, but probably small ones?)

One of the problems with the existing search is we had lots of client code JavaScript specific to Wikibase or configuration code to support Wikibase. For example this code in MobileFrontend: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/src/mobile.startup/extendSearchParams.js#L28 I really want us to get away from that in the new search by handling this stuff on the server side.

I think it would be okay to expand the existing format but I'd rather not deviate too much from it e.g. we should keep the "pages" entry in the response, "title" to mean the search title we display in suggestions, description for description and thumbnail to mean thumbnail. I think it's okay to add new fields though if that's what you mean?

WVUI is in control of rendering, so we could expand WVUI to render those additional properties in some way. You'd need to work with the WVUI team (I'd suggest @Volker_E) to incorporate those UI changes but I don't see why not.

{
"id":3012251,
"key":"Q3153277",
// label
"title":"International Journal of Molecular Sciences",
// Description with fallback of the entity
description":"peer-reviewed scientific journal",
"thumbnail":null

// Wikidata specific entry
// Add new item to API entry for matched string, which can include fallback, would be one of label, alias, or Qid (item ID)
"fallback":
},

Is there a way to add functionality to the search to allow this More expansion bar?

There isn't right now. It sounds like this would be a request for a 2nd page of results? The API could be expanded to have a "continue" parameter, and you'd need to talk to @Volker_E and a designer about how that would be incorporated in the UI.

My suggestion would be:

1) current UI searching and displaying labels rather than Q codes
2) Expand API to include new parameters
3) Expand WVUI/Vector to support rendering the new optional API data
4) Add "continue" functionality to API
5) Expand WVUI to support more functionality.

Note new Vector is currently planned for Wikipedias in 2022, but the timeline for Wikidata.org doesn't need to be that. We can make it default when all the above is done.

So it looks like the only thing we would need to implement to enable a different API format to work would be https://github.com/wikimedia/wvui/blob/d77d02ac54ca2ba9a22e93ffe20debf36fc2e37b/src/components/typeahead-search/http/restSearchClient.ts#L11
This seems to be the place where the API definition of what a Search Result Object switch to being a RestResult in WVUI.
It looks like this could also be the place to tidy up some terminology if this is meant to be a generic search, such a fetchByTitle etc.
As mentioned in the directory structure, this should probably be a generic typeahead-search not tied to MediaWiki or MediaWIki concepts.

I would hope then that swapping out which client should be used based on if Wikibase is loaded or not would be fairly trivial? And that should either live in WVUI or in Wikibase as some kind of override?

If that is the case I see no reason that when we have some free resources we couldn't quickly implement this TS client for the slightly different format.
And then also implement the REST api.
I'd advocate for an API in core that uses a less mediawikiy, more generic type ahead focused format too.
That would also open up this #WUVI component for use in other contexts!

I'm happy to help with code review etc.

I would hope then that swapping out which client should be used based on if Wikibase is loaded or not would be fairly trivial? And that should either live in WVUI or in Wikibase as some kind of override?

My expectation is ideally this would be a config only change .
Ideally Wikidata would set wgVectorSearchApi = '/w/rest.php/v1/wikidata-search/title?q=$1&limit=$2. and it would just work.

However, here's an approach that can be used right now to demonstrate a potential short-term client that could live inside wikibase using the old API. Paste this code into your JS console on https://en.wikipedia.org/?useskinversion=2&useskin=vector to see in action.

mw.config.set('wgVectorSearchClient', {
	fetchByTitle: function ( query, domain, limit = 10 ) {
		var abort = function () {
			// not implemented
		};
		var fetched = fetch( 'https://www.wikidata.org/w/api.php?origin=*&action=wbsearchentities&format=json&errorformat=plaintext&language=en&uselang=en&type=item&search=' + query )
			.then( function ( res ) {
				return res.json();
			} ).then( function (data) {
				var result = {
					query: query,
					results: data.search.map( function ( search ) {
						return {
							id: search.pageid,
							key: search.id,
							title: search.label + ' (' + search.title + ')',
							description: search.description
							// thumbnail: TODO
						}
					} )
				};
				return result;
			} );

		return {
			abort : abort,
			fetch: fetched
		};
	}
})

Addshore moved this task from Investigate & Discuss to To Prioritize on the [DEPRECATED] wdwb-tech board.Sep 21 2021, 8:19 AM

We put this through our tech track prioritization session today and then realized that in order for us to be able to tackle it via this track (which means we have to keep product happy / not negatively change user experience) we would need to fix the WVUI part mentioned in T275251#7359339

Add "continue" functionality to API

Expand WVUI to support more functionality.

The feature is feature flagged, so I'm assuming you could do the work that isn't blocked by product so we can at least make a little bit of progress here?

As noted in T275251#7325272 my team cannot guarantee support for the existing code in the new opt-in Vector skin beyond November.

Jdlrobson moved this task from Untriaged to Move to Backlog on the Web-Team-Backlog (Tracking) board.Oct 7 2021, 4:57 PM

Jdlrobson moved this task from Move to Backlog to Discuss further on the Web-Team-Backlog (Tracking) board.

Jdlrobson moved this task from Discuss further to Move to Backlog on the Web-Team-Backlog (Tracking) board.Oct 7 2021, 8:36 PM

LGoto edited projects, added Web-Team-Backlog; removed Web-Team-Backlog (Tracking).Oct 7 2021, 9:10 PM

LGoto moved this task from Incoming to Tracking on the Web-Team-Backlog board.Oct 7 2021, 9:11 PM

Lucas_Werkmeister_WMDE subscribed.Oct 27 2021, 11:13 AM

Michael subscribed.Oct 28 2021, 10:19 AM

@Jdlrobson, apologies for the delay in response! We now have capacity in the team and this task is top of our list. After reviewing, our proposal would be this:

Update the search box, adding in

language support
a match alias as a new element

To this end there are three implementation options

Build on top/modify existing WVUI typeahead search component
Creating a WB variant of the existing WVUI typeahead component
Creating the Codex (vue3) typeahead component

As our proposal involves working on topics normally outside of WMDE's scope, please let me know if the proposal is fine and which of the implementation options would work best for y'all. We can also arrange a call if you'd like to discuss it all in more detail.

Awesome!

Regarding the UI, 3 sounds like the best approach if you have capacity. We eventually need to port this to Codex anyway, so any work you do towards this would be super helpful. Rethinking this component from the Wikimedia DE perspective would also be an invaluable exercise!

In terms of integrating it into MediaWiki, inside Vector, the configuration $wgVectorWvuiSearchOption can be used to turn on any Wikidata specific behaviours eg. match alias/language support.

If we end up with a lot of code that's Wikidata perspective, you might want to consider allowing Vector to disable the search widget altogether so that Wikidata can provide its own variant/setup code. Note, that in future we'll be using this same component in the mobile site so that's worth considering when thinking about how best to architect this right now.

karapayneWMDE edited projects, added Wikidata-Campsite (Team A Hearth 🏰🔥); removed Wikidata-Campsite.Nov 3 2021, 5:18 PM

Great! Added this to our task board. We'll review the level of effort for the codex implementation tomorrow and, assuming the codex element doesn't bloat the scope to an absurd degree, let the DS team know this is happening

• Manuel moved this task from Incoming to Prioritized Backlog on the Wikidata-Campsite (Team A Hearth 🏰🔥) board.Nov 5 2021, 11:10 AM

Michael mentioned this in T291526: Make a plan for TypeaheadSearch's inclusion in Codex.Nov 5 2021, 2:59 PM

Bugreporter mentioned this in T262088: Have the Search widget on the site use the new Search API.Nov 8 2021, 2:24 PM

• nray subscribed.Nov 8 2021, 8:02 PM

This is currently waiting on T291526: Make a plan for TypeaheadSearch's inclusion in Codex

@Jdlrobson, there's movement for option 3, but will the timeline for it match the requirement? Above you mentioned that you may be unable to support the current version past November. Do we need to decide on an in-between step or are we confident that we'll be able to get this into codex before our current version can no longer be supported?

In T275251#7372590, @Jdlrobson wrote:

... As noted in T275251#7325272 my team cannot guarantee support for the existing code in the new opt-in Vector skin beyond November.

@karapayneWMDE jfyi @Jdlrobson is on vacation this and next week. Defer to @SCherukuwada and @nray for possible feedback here in the meantime.

In T275251#7504118, @karapayneWMDE wrote:

@Jdlrobson, there's movement for option 3, but will the timeline for it match the requirement? Above you mentioned that you may be unable to support the current version past November. Do we need to decide on an in-between step or are we confident that we'll be able to get this into codex before our current version can no longer be supported?

In T275251#7372590, @Jdlrobson wrote:

... As noted in T275251#7325272 my team cannot guarantee support for the existing code in the new opt-in Vector skin beyond November.

Michael mentioned this in T295992: Consider making use of thumbnails in new Vector Search suggestions on Wikidata.Nov 18 2021, 3:17 PM

Michael renamed this task from Rest Search API is not wikidata aware (only accepts queries beginning with Q) to New Vector Search is not Wikidata aware.Nov 18 2021, 4:27 PM

Michael updated the task description. (Show Details)

Option 3 sounds great @karapayneWMDE I think we can continue supporting this a little longer. How important is modern Vector skin on wikidata.org ? How many users are using it? (Note the autocomplete code will still be used on the normal Vector skin for now)

Not very many users are using it, as far as I can tell. I won’t share the exact result numbers, but sharing the queries in case anyone wants to check I didn’t do something stupid:

MariaDB [wikidatawiki]> SELECT up_value, COUNT(*) FROM user_properties WHERE up_property = 'VectorSkinVersion' GROUP BY up_value;

Between 500 and 600 users, total, have VectorSkinVersion set to 2, compared to over 900k having it set to 1. (A handful have it set to 0, which confuses me, and if I understand correctly, for users who never touched the preference it wouldn’t be set at all? But it seems unlikely that almost a million users would have manually set it to 1…)

MariaDB [wikidatawiki]> SELECT up_value, COUNT(*) FROM user_properties WHERE up_property = 'VectorSkinVersion' AND EXISTS (SELECT * FROM recentchanges WHERE rc_actor = (SELECT actor_id FROM actor WHERE actor_user = up_user)) GROUP BY up_value;

Just under 300 users who made at least one edit in the past 30 days have the preference set to 2, compared to just under 12k having it set to 1.

MariaDB [wikidatawiki]> SELECT up_value, COUNT(*) FROM user_properties WHERE up_property = 'VectorSkinVersion' AND (SELECT COUNT(*) FROM recentchanges WHERE rc_actor = (SELECT actor_id FROM actor WHERE actor_user = up_user)) >= 100 GROUP BY up_value;

A bit over 100 very active editors (≥100 edits in the past 30 days) have it set to 2, compared to over 1300 very active editors having it set to 1.

Jdlrobson mentioned this in T290688: Drop support for legacy search in modern Vector.Dec 7 2021, 10:32 PM

• Manuel moved this task from Prioritized Backlog to Incoming on the Wikidata-Campsite (Team A Hearth 🏰🔥) board.Dec 9 2021, 9:34 AM

• Manuel moved this task from Incoming to Prioritized Backlog on the Wikidata-Campsite (Team A Hearth 🏰🔥) board.Dec 9 2021, 1:30 PM

Lectrician1 subscribed.Dec 29 2021, 9:47 PM

Lens0021 subscribed.Jan 14 2022, 4:21 AM

AnneT mentioned this in T302723: TypeaheadSearch in Codex (Wikidata version).Mar 9 2022, 7:27 PM

Jdlrobson mentioned this in T300182: Wikidata.org responsive behaviour conflicts with Vector Max width.Apr 14 2022, 3:22 PM

Addshore moved this task from Blocked to Sorted Team A on the [DEPRECATED] wdwb-tech board.Apr 22 2022, 8:02 AM

Addshore added a project: wmde-wikidata-tech.Apr 22 2022, 8:03 AM

Jdlrobson moved this task from Search to Work with other teams on the Desktop Improvements (Vector 2022) board.Jun 8 2022, 3:56 PM

This now mainly waits on T297025 and more specifically T303558: Typeahead search deployment: Use CdxTypeaheadSearch in Vector

After that is done, it is hopefully enough to override "wgVectorSearchClient", and/or maybe some of config options added in 758961: Search: Use Codex and Vue 3 instead of WVUI and Vue 2 (not yet merged).

Maybe we can even get rid of all the hacks that make it currently work.

I believe these tasks are taken care of (as of next train Codex will be used in Vector). Is this unblocked now?

At the very least we should have another closer look now. Thank you for keeping an eye on this.

Change 817312 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/Wikibase@master] [Do not merge!][PoC] Make new Vector search Wikidata aware

https://gerrit.wikimedia.org/r/817312

gerritbot added a project: Patch-For-Review.Jul 26 2022, 5:12 PM

ItamarWMDE subscribed.Jul 27 2022, 1:23 PM

After the merge of 758961: Search: Use Codex and Vue 3 instead of WVUI and Vue 2 and the train running, we were expecting the search on Wikidata.org with the new skin to be broken. However, this is not the case, we are still seeing the Wikibase workaround/replacement. Can you confirm that this was intentionally done in order to prevent Wikidata.org to be broken with Vector-2022?

If that was intentionally done, should we inform you of anything after we made the fix on our side?

• Manuel edited projects, added Wikidata Dev Team; removed Wikidata-Campsite (Team A Hearth 🏰🔥).Aug 3 2022, 1:40 PM