Page MenuHomePhabricator

New Vector Search is not Wikidata aware
Closed, ResolvedPublic

Description

The WVUI search widget used in the new Vector skin is not yet providing Wikidata the same functionality as the search box in the legacy Vector skin. There are multiple functional requirements that are not yet met:

Functional requirements

Related to the API queried for results:
1. The search must search through Labels and Aliases in any language and Entity-Ids

In the current search in the legacy Vector skin, the search box uses the action=wbsearchentities API endpoint. We need to write some new adapter code for the new Search box in the new Vector skin that does that as well.
The API used by the search box in the new Vector skin currently searches only page titles (Q-IDs in case of Wikidata) which is not helpful.
Note: Descriptions are intentionally not searched by the action=wbsearchentities API.

Related to how the results are displayed to the user:
1.: Show matches outside the Label in the current language

On Wikidata, the search goes not only through the Entities' Labels in the current Language, but also through their Aliases, and all the Labels and Aliases in all other languages. If any of these match, then that must also be shown in the search result. Also, if one searches for an Entity-ID directly (e.g., "Q42"), then that matching Entity must be shown as well.

2.: Handle multiple languages

Each "text object" (i.e. title, description, alias/search match)'s language should be explicitly set in a HTML lang="" attribute. This is because the language can be different due to language fallbacks and this change allows screen readers to function. We will also need to account for the possibility of different writing directions.

3.: Allow for loading more results in an obvious way

The current WVUI TypeaheadSearch component seems to limit the number of results being displayed in the menu to 10 (probably configurable). Wikidata provides a high amount of matching results per search, and one often doesn't usually find what one is looking for in the first couple of suggestions.
Therefore, we require users to be able to (maybe implicitly) request further results. The solution that we find must be obvious to users and the pattern should be the same across Wikidata (e.g. property lookup).

Possible solutions for 3.:

3.1. We provide users with an option (e.g., a “more”-pseudoresult) that they can click to load more results within the results menu.

3.2. We allow users to scroll within the dropdown results menu and load more results in the background on scroll


Original task description:

The existing search API only works with queries containing "Q" and returns results without the correct display title
https://wikidata.org/w/rest.php/v1/search/title?q=Q3&limit=10

This means in future Wikidata will become useless with the roll out of the latest version of Vector and will stall further adoption efforts of the wikimedia wvui library.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Has any progress been made with this API?

We (the web team) will be beginning the process for porting the mobile site to this component and without this API we will need to consider one of two options, neither of which is great:

  1. disabling the JavaScript enhancement for search on Wikidata.

OR

  1. Moving the frontend code to a Wikidata extension as technical debt to be tackled at a later date.

Regarding the ranking discussion, I would've thought you're more likely to want a Lexeme if you're on a page about a Lexeme already, and so it should be prioritized in the results list (but not to the extent of excluding other results).

That should perhaps be split off into a separate task, though. The priority here seems to be "make a hook that can return a search engine for results".

From a user perspective, I think many consumers would actually want Wikidata's "identifier matched somewhere" as a result. But I appreciate that it might break things if results are returned that do not correspond to title prefix matches.

If the search title contract is not to be altered, maybe the change needed is a hook for the endpoint used to make queries, that the UI code uses, as well as an implementation of such an endpoint? Maybe Wikibase could hook the search content API and add data to it, and the UI component could be directed to use that, rather than creating a whole new endpoint? Conceptually the content being matched is similar to page content.

@Jdlrobson Hello! new EM for wd here 👋

By when would you need a response on what to do in order to not jeopardize the porting process?

Hi @karapayneWMDE later today, I'm making a config change which will mean Wikidata is the only Wikimedia project that does not use the Vue search.

I think the timeline is dependent on you.

Ideally, I would like to drop support for the old search now, which would mean that Wikidata for the modern Vector skin would have no autocomplete functionality. This might be okay, as Wikidata currently doesn't have the new Vector set as the default skin, but I want to double-check that with you. If you want more time, I can offer 3 months maximum at this point, after which I'd feel uncomfortable maintaining the old search with the new skin.

According to https://www.mediawiki.org/wiki/Reading/Web/Desktop_Improvements, we are looking to make the new skin the default on all wikis by the end of the year, so ideally we need to make the search compatible with Wikidata before the end of the year.

The main blocker for this is having a functioning search API that returns results for queries such as https://wikidata.org/w/rest.php/v1/search/title?q=Q3&limit=10. It's also acceptable for Wikidata to provide its own API if that is preferred - the search API is configurable and we can point to any service you want to, provided the response is consistent with that APIs specification.

Let me know if you want to chat through this in a video call with a WMDE Engineer

@karapayneWMDE I've setup T290688 with the proposed next step in case you want to test the implications on Wikidata for the modern Vector skin. I'd like to either merge this within the next 2 weeks, or at the latest December 1st, if 3 months is a reasonable timeline to address this issue. Please let me know your preference.

Hi @Jdlrobson

So initially highlighting what is currently displayed in our results

  1. Label of matched entity with fallback in user language
  2. Matched string, which can include fallback, would be one of label, alias, or Qid (item ID)
  3. Description with fallback of the entity

image.png (392×866 px, 51 KB)

The main blocker for this is having a functioning search API that returns results for queries such as https://wikidata.org/w/rest.php/v1/search/title?q=Q3&limit=10. It's also acceptable for Wikidata to provide its own API if that is preferred - the search API is configurable and we can point to any service you want to, provided the response is consistent with that APIs specification.

So we do not want to confuse things by adding entity search to rest.php/v1/search/title that is specifically meant to search for titles, the functionality there should remain the same and just be provided by MediaWiki, as also expressed by daniel in T275251#6944581, this would break the contract.
So we would want to provide a separate API (which you say is fine)
However we don't really want to have to provide an API response that is again confusing if people were to ever look at it (with title etc).
Is there some way we could provide some sort of API adapter or a second format that this JS code can deal with?

Taking inspiration from https://www.mediawiki.org/wiki/API:REST_API/Reference#Search_result_object
We would want to return:

  • Page title, or just URL to link to for the result. Thinking ahead for possible future cases we may want to consider, URL would be preferred
  • Display line 1 - Which in our case would be something like LABEL (MATCH) where LABEL is a fall back enabled label for user display and MATCH is the string that actually matched, if different from that label
  • Display line 2 - This would be our description with language fallback
  • Image url - We wouldn't provide this initially, but would want to in the future, and would ideally like to hide the image doesn't exist part of the result

But we don't want to introduce a confusing situation of miss using the API, or format.
Perhaps this is even something that should change in the main REST spec?

The other key part of the experience that we don't want to loose is the More bar.

image.png (651×865 px, 70 KB)

Clicking this expands the existing results without navigating to another page

image.png (714×848 px, 76 KB)

So some concrete questions:

  1. Can we provide a different API format, that is more generic for a search usecase, rather than one that it tightly bound to MediaWiki concepts? (this would probably need some changes in JS, but probably small ones?)
  2. Is there a way to add functionality to the search to allow this More expansion bar?

So we would want to provide a separate API (which you say is fine)

Yep new API is fine and the path of least resistance,

So we do not want to confuse things by adding entity search to rest.php/v1/search/title that is specifically meant to search for titles

We'd need to make some tweaks to the WVUI and Vector (legacy skin) to allow configuration of the API path. We have wgVectorSearchHost for changing the host, so I'd take care of things that side making sure this configuration evolves to include path.

Can we provide a different API format, that is more generic for a search usecase, rather than one that it tightly bound to MediaWiki concepts? (this would probably need some changes in JS, but probably small ones?)

One of the problems with the existing search is we had lots of client code JavaScript specific to Wikibase or configuration code to support Wikibase. For example this code in MobileFrontend: https://github.com/wikimedia/mediawiki-extensions-MobileFrontend/blob/master/src/mobile.startup/extendSearchParams.js#L28 I really want us to get away from that in the new search by handling this stuff on the server side.

I think it would be okay to expand the existing format but I'd rather not deviate too much from it e.g. we should keep the "pages" entry in the response, "title" to mean the search title we display in suggestions, description for description and thumbnail to mean thumbnail. I think it's okay to add new fields though if that's what you mean?

WVUI is in control of rendering, so we could expand WVUI to render those additional properties in some way. You'd need to work with the WVUI team (I'd suggest @Volker_E) to incorporate those UI changes but I don't see why not.

{
"id":3012251,
"key":"Q3153277",
// label
"title":"International Journal of Molecular Sciences",
// Description with fallback of the entity
description":"peer-reviewed scientific journal",
"thumbnail":null

// Wikidata specific entry
// Add new item to API entry for matched string, which can include fallback, would be one of label, alias, or Qid (item ID)
"fallback":
},

Is there a way to add functionality to the search to allow this More expansion bar?

There isn't right now. It sounds like this would be a request for a 2nd page of results? The API could be expanded to have a "continue" parameter, and you'd need to talk to @Volker_E and a designer about how that would be incorporated in the UI.

My suggestion would be:

  • 1) current UI searching and displaying labels rather than Q codes
  • 2) Expand API to include new parameters
  • 3) Expand WVUI/Vector to support rendering the new optional API data
  • 4) Add "continue" functionality to API
  • 5) Expand WVUI to support more functionality.

Note new Vector is currently planned for Wikipedias in 2022, but the timeline for Wikidata.org doesn't need to be that. We can make it default when all the above is done.

So it looks like the only thing we would need to implement to enable a different API format to work would be https://github.com/wikimedia/wvui/blob/d77d02ac54ca2ba9a22e93ffe20debf36fc2e37b/src/components/typeahead-search/http/restSearchClient.ts#L11
This seems to be the place where the API definition of what a Search Result Object switch to being a RestResult in WVUI.
It looks like this could also be the place to tidy up some terminology if this is meant to be a generic search, such a fetchByTitle etc.
As mentioned in the directory structure, this should probably be a generic typeahead-search not tied to MediaWiki or MediaWIki concepts.

I would hope then that swapping out which client should be used based on if Wikibase is loaded or not would be fairly trivial? And that should either live in WVUI or in Wikibase as some kind of override?

If that is the case I see no reason that when we have some free resources we couldn't quickly implement this TS client for the slightly different format.
And then also implement the REST api.
I'd advocate for an API in core that uses a less mediawikiy, more generic type ahead focused format too.
That would also open up this #WUVI component for use in other contexts!

I'm happy to help with code review etc.

I would hope then that swapping out which client should be used based on if Wikibase is loaded or not would be fairly trivial? And that should either live in WVUI or in Wikibase as some kind of override?

My expectation is ideally this would be a config only change .
Ideally Wikidata would set wgVectorSearchApi = '/w/rest.php/v1/wikidata-search/title?q=$1&limit=$2. and it would just work.

However, here's an approach that can be used right now to demonstrate a potential short-term client that could live inside wikibase using the old API. Paste this code into your JS console on https://en.wikipedia.org/?useskinversion=2&useskin=vector to see in action.

mw.config.set('wgVectorSearchClient', {
	fetchByTitle: function ( query, domain, limit = 10 ) {
		var abort = function () {
			// not implemented
		};
		var fetched = fetch( 'https://www.wikidata.org/w/api.php?origin=*&action=wbsearchentities&format=json&errorformat=plaintext&language=en&uselang=en&type=item&search=' + query )
			.then( function ( res ) {
				return res.json();
			} ).then( function (data) {
				var result = {
					query: query,
					results: data.search.map( function ( search ) {
						return {
							id: search.pageid,
							key: search.id,
							title: search.label + ' (' + search.title + ')',
							description: search.description
							// thumbnail: TODO
						}
					} )
				};
				return result;
			} );

		return {
			abort : abort,
			fetch: fetched
		};
	}
})

We put this through our tech track prioritization session today and then realized that in order for us to be able to tackle it via this track (which means we have to keep product happy / not negatively change user experience) we would need to fix the WVUI part mentioned in T275251#7359339

  1. Add "continue" functionality to API
  2. Expand WVUI to support more functionality.

The feature is feature flagged, so I'm assuming you could do the work that isn't blocked by product so we can at least make a little bit of progress here?

As noted in T275251#7325272 my team cannot guarantee support for the existing code in the new opt-in Vector skin beyond November.

@Jdlrobson, apologies for the delay in response! We now have capacity in the team and this task is top of our list. After reviewing, our proposal would be this:

Update the search box, adding in

  • language support
  • a match alias as a new element

To this end there are three implementation options

  1. Build on top/modify existing WVUI typeahead search component
  2. Creating a WB variant of the existing WVUI typeahead component
  3. Creating the Codex (vue3) typeahead component

As our proposal involves working on topics normally outside of WMDE's scope, please let me know if the proposal is fine and which of the implementation options would work best for y'all. We can also arrange a call if you'd like to discuss it all in more detail.

Awesome!

Regarding the UI, 3 sounds like the best approach if you have capacity. We eventually need to port this to Codex anyway, so any work you do towards this would be super helpful. Rethinking this component from the Wikimedia DE perspective would also be an invaluable exercise!

In terms of integrating it into MediaWiki, inside Vector, the configuration $wgVectorWvuiSearchOption can be used to turn on any Wikidata specific behaviours eg. match alias/language support.

If we end up with a lot of code that's Wikidata perspective, you might want to consider allowing Vector to disable the search widget altogether so that Wikidata can provide its own variant/setup code. Note, that in future we'll be using this same component in the mobile site so that's worth considering when thinking about how best to architect this right now.

Great! Added this to our task board. We'll review the level of effort for the codex implementation tomorrow and, assuming the codex element doesn't bloat the scope to an absurd degree, let the DS team know this is happening

@Jdlrobson, there's movement for option 3, but will the timeline for it match the requirement? Above you mentioned that you may be unable to support the current version past November. Do we need to decide on an in-between step or are we confident that we'll be able to get this into codex before our current version can no longer be supported?

... As noted in T275251#7325272 my team cannot guarantee support for the existing code in the new opt-in Vector skin beyond November.

@karapayneWMDE jfyi @Jdlrobson is on vacation this and next week. Defer to @SCherukuwada and @nray for possible feedback here in the meantime.

@Jdlrobson, there's movement for option 3, but will the timeline for it match the requirement? Above you mentioned that you may be unable to support the current version past November. Do we need to decide on an in-between step or are we confident that we'll be able to get this into codex before our current version can no longer be supported?

... As noted in T275251#7325272 my team cannot guarantee support for the existing code in the new opt-in Vector skin beyond November.

Michael renamed this task from Rest Search API is not wikidata aware (only accepts queries beginning with Q) to New Vector Search is not Wikidata aware.Nov 18 2021, 4:27 PM
Michael updated the task description. (Show Details)

Option 3 sounds great @karapayneWMDE I think we can continue supporting this a little longer. How important is modern Vector skin on wikidata.org ? How many users are using it? (Note the autocomplete code will still be used on the normal Vector skin for now)

Not very many users are using it, as far as I can tell. I won’t share the exact result numbers, but sharing the queries in case anyone wants to check I didn’t do something stupid:

MariaDB [wikidatawiki]> SELECT up_value, COUNT(*) FROM user_properties WHERE up_property = 'VectorSkinVersion' GROUP BY up_value;

Between 500 and 600 users, total, have VectorSkinVersion set to 2, compared to over 900k having it set to 1. (A handful have it set to 0, which confuses me, and if I understand correctly, for users who never touched the preference it wouldn’t be set at all? But it seems unlikely that almost a million users would have manually set it to 1…)

MariaDB [wikidatawiki]> SELECT up_value, COUNT(*) FROM user_properties WHERE up_property = 'VectorSkinVersion' AND EXISTS (SELECT * FROM recentchanges WHERE rc_actor = (SELECT actor_id FROM actor WHERE actor_user = up_user)) GROUP BY up_value;

Just under 300 users who made at least one edit in the past 30 days have the preference set to 2, compared to just under 12k having it set to 1.

MariaDB [wikidatawiki]> SELECT up_value, COUNT(*) FROM user_properties WHERE up_property = 'VectorSkinVersion' AND (SELECT COUNT(*) FROM recentchanges WHERE rc_actor = (SELECT actor_id FROM actor WHERE actor_user = up_user)) >= 100 GROUP BY up_value;

A bit over 100 very active editors (≥100 edits in the past 30 days) have it set to 2, compared to over 1300 very active editors having it set to 1.

This now mainly waits on T297025 and more specifically T303558: Typeahead search deployment: Use CdxTypeaheadSearch in Vector

After that is done, it is hopefully enough to override "wgVectorSearchClient", and/or maybe some of config options added in 758961: Search: Use Codex and Vue 3 instead of WVUI and Vue 2 (not yet merged).

Maybe we can even get rid of all the hacks that make it currently work.

I believe these tasks are taken care of (as of next train Codex will be used in Vector). Is this unblocked now?

Michael changed the task status from Stalled to Open.Jul 25 2022, 8:39 PM

At the very least we should have another closer look now. Thank you for keeping an eye on this.

Change 817312 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/Wikibase@master] [Do not merge!][PoC] Make new Vector search Wikidata aware

https://gerrit.wikimedia.org/r/817312

After the merge of 758961: Search: Use Codex and Vue 3 instead of WVUI and Vue 2 and the train running, we were expecting the search on Wikidata.org with the new skin to be broken. However, this is not the case, we are still seeing the Wikibase workaround/replacement. Can you confirm that this was intentionally done in order to prevent Wikidata.org to be broken with Vector-2022?

If that was intentionally done, should we inform you of anything after we made the fix on our side?

Change 825258 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/skins/Vector@master] Search: Drop Wikidata specific behaviour

https://gerrit.wikimedia.org/r/825258

@Michael the above patch is what you need (feel free to merge that as needed when you are ready to)

Change 825258 abandoned by Jdlrobson:

[mediawiki/skins/Vector@master] Search: Drop Wikidata specific behaviour

Reason:

(Feel free to restore when you are actively working on this)

https://gerrit.wikimedia.org/r/825258

Replacing with Codex now that TypeaheadSearch builds on top of it and with T310243: Deprecate WVUI in favor of Codex on horizon.

Change 817312 abandoned by Michael Große:

[mediawiki/extensions/Wikibase@master] [Do not merge!][PoC] Make new Vector search Wikidata aware

Reason:

This is done, see changes around T275251

https://gerrit.wikimedia.org/r/817312

Jdlrobson claimed this task.