Page MenuHomePhabricator

Add stemming to simple search endpoints
Closed, ResolvedPublic8 Estimated Story Points

Description

The current REST API simple search endpoints do not always give the best results in certain cases (see searching for housecats)

We think this can be solved by implementing stemming

Acceptance criteria:

  • Simple search for items and properties in the REST API supports stemming

Event Timeline

Change #1164448 had a related patch set uploaded (by Jakob; author: Jakob):

[mediawiki/extensions/WikibaseCirrusSearch@master] Enable stemming for simple item/property search

https://gerrit.wikimedia.org/r/1164448

Change #1165032 had a related patch set uploaded (by Jakob; author: Jakob):

[mediawiki/extensions/Wikibase@master] Search: Pass stemming settings to InLabelSearch

https://gerrit.wikimedia.org/r/1165032

Change #1165036 had a related patch set uploaded (by Jakob; author: Jakob):

[mediawiki/extensions/WikibaseCirrusSearch@master] Add stemming settings param for fwd compatibility

https://gerrit.wikimedia.org/r/1165036

Change #1165036 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Add stemming settings param for fwd compatibility

https://gerrit.wikimedia.org/r/1165036

Change #1165032 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Search: Pass stemming settings to InLabelSearch

https://gerrit.wikimedia.org/r/1165032

Change #1164448 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Enable stemming for simple item/property search

https://gerrit.wikimedia.org/r/1164448

I tried testing this (on wikidata.org) buuuut I'm not sure it's working there yet?

I tried testing this (on wikidata.org) buuuut I'm not sure it's working there yet?

Hmm, what search terms did you try? It is working on wikidata.org now, but we also found that it's not always as good as we expected it to be. Here are some examples:

Also note that exact matches are preferred over matches on stemmed values, so what you're looking for might just not be at the top of the results list.

I tried using the housecats example but didn't immediately see much of a difference but, but for skydiving, kitesurfing and windsurfing it was better than before

Really? https://www.wikidata.org/w/rest.php/wikibase/v0/search/items?language=en&q=housecats shows Q146 "house cat" at the very top for me. I'm pretty sure it previously didn't find it at all.

(It also finds a DJ called "Felix da Housecat" which I'm now very intrigued about.)

Hahaha it did show me the cool DJ as well BUT I didn't remember if it showed house cat last time (I realised I clicked on the link which now must working WITH stemming)

so we're good to go!

Ahh, yes! The link there now obviously gets the improved results too. Including a snapshot of the results in the description would've made that clearer, sorry. But yes, I'm pretty sure I added the link because it previously did NOT find the "house cat" item.