Test ElasticSearch suggester to see if it meets user needs better than PrefixSearch
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Deskana
	Jul 13 2015, 9:14 PM

Description

David wrote a cool document about using the ElasticSearch suggester: https://docs.google.com/document/d/1pn64e9Tb_ZBbR470K9dofr79yINz9Xzs8t6bVUb1yUo/edit#

There's now a demo here: https://suggesty.wmflabs.org/suggest.html

We should run a test to see if this suggester meets user needs better than the prefixsearch in the search box at the top right of pages on wikis.

Related Objects
Search...

Status	Assigned	Task
Resolved	EBernhardson	T105743 Test ElasticSearch suggester to see if it meets user needs better than PrefixSearch
Resolved	Smalyshev	T105746 Build out an API that exposes ElasticSearch suggester results for a given query
Resolved	• dcausse	T106128 Write a preliminary scoring function to rank suggestions
Resolved	Smalyshev	T106129 Write the API that aggregates multiple suggestion and returns ranked suggestions to the client
Resolved	• dcausse	T106127 Write a script that creates the completion suggester index
Resolved	EBernhardson	T111078 Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08
Resolved	EBernhardson	T111091 Allow extensions to change the method used to get suggestion results
Resolved	EBernhardson	T111137 Override core suggester in AB test between current suggestions and the experimental cirrus-suggest api

Event Timeline

• Deskana created this task.Jul 13 2015, 9:14 PM

• Deskana raised the priority of this task from to Medium.

• Deskana updated the task description. (Show Details)

• Deskana added a project: Discovery-ARCHIVED.

• Deskana subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 13 2015, 9:14 PM

meets user needs

What's the status of this metric?

In T105743#1450672, @Jdouglas wrote:

What's the status of this metric?

Our goal in Q1 2015-16 is to reduce zero results rate, so that is the objective here.

Open questions:

What wikis do we test this on?
Do we offer users an opt out?
Do we test it with some small percentage of users and compare them to the baseline?
Do we test it 50/50 with existing users (i..e an A/B test) and use some data collection to figure out which is best?
How quickly can we turn it off if it's not working?

More far-reaching questions:

Why don't we just try this in the Android app? We could create this as an API and try it there.

• Deskana added a subtask: T105746: Build out an API that exposes ElasticSearch suggester results for a given query.Jul 13 2015, 9:27 PM

Here's an example of A/B testing in the Android app: https://gerrit.wikimedia.org/r/#/c/218983/

Just to add a bit more context to this task.

The Completion suggester described in the document won't work well out of the box and what I did is a rough prototype.
I think there is several tasks to do before doing a live test.

1. Scoring
Suggestions are sorted by a score computed at index time. Today I just computed a very basic composite score on data available in the dump:

number of incoming links
number of external links
page size
number of headings
number of redirects
penalty factor on disambiguation pages

Unfortunately none of those will allow us to score pages "correctly". Weighting higher the number of incoming links seems to be the best trade-off. But some pages have a very high number of incoming links (i.e. dates) and do not deserve being ranked so high. I think we should investigate adding a new score component that is related to the number of pageview statistics (https://dumps.wikimedia.org/other/pagecounts-raw/).

2. Analyzers
The prototype was configured with a very basic analysis chain. Depending on the language we should configure it correctly.

3. Multiple suggestions
Suggestion are returned according to their computed weight. If you enable fuzzy searches, pages that have a lot of typos won't be scored lower than the ones that match perfectly. The solution I tested is to run multiple suggestions at the same time :

exact
fuzzy with a penalty factor of 0.2
exact (stop words filtered) with a penalty factor of 0.3
fuzzy (stop words filtered) with a penalty factor of 0.1

The multiple suggestions are aggregated on the backend (HTML page in the proto) and the top 10 suggestions will be sent back to the client.
I think there is some work here also: evaluate the best combination of suggestions and penalty factors. Something that would be nice also is a kind of "cutoff": i.e. when I type a typo I may want to see only the "best" suggestions and filter pages that have a very low score.

4. Redirects
Some pages have a lot of redirects (https://en.wikipedia.org/wiki/United_States?action=cirrusdump):

If I type "Un" I think United States can be in the top 10 suggestions because of its high score.
But if I type "Ya" I think it's not fair to suggest "Yankee land" in the top 10 (yankee land redirects to United States).

What I tried is to group redirects of a single page into "similar" groups and apply a penalty factor if the group is "far" (levenshtein distance) from the "official page name".
This is very hazardous and deserves a lot more work.
Another problem of redirects in existing popular wikis is that they contain already typos:

Jurrassic Park redirects to Jurassic Park
Airton Senna redirects to Ayrton Senna

Redirects seem to be a tool used by the community to allow "fuzzy suggestions" today.
I really don't know how to deal with that, suggesting something with a typo is not ideal... Can we curate data with wikidata?

5. It's not scrollable
Prefix search results are scrollable (I think), Suggestions are not scrollable by nature. Google shows only the top 4 and I think there is good reasons for that.

You can have a look here: https://github.com/nomoa/suggester-prototype/
(Sorry this is very quick and dirty)

Smalyshev subscribed.Aug 8 2015, 12:16 AM

• Deskana closed subtask T105746: Build out an API that exposes ElasticSearch suggester results for a given query as Resolved.Aug 31 2015, 6:19 PM

• Deskana added a subtask: T111078: Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08.Sep 1 2015, 6:36 PM

• Deskana updated the task description. (Show Details)Sep 1 2015, 6:57 PM

• Deskana set Security to None.

• Deskana closed subtask T111078: Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08 as Resolved.Sep 12 2015, 2:35 AM

We built, tested, and deployed this feature. seems resolved.

Test ElasticSearch suggester to see if it meets user needs better than PrefixSearch Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Test ElasticSearch suggester to see if it meets user needs better than PrefixSearch
Closed, ResolvedPublic
Actions

Related Objects
Search...