EPIC: Improve spelling error detection and correction for search to fulfil Q1 2015-16 quarterly goal.
Closed, InvalidPublic
Actions

Assigned To

None

Authored By

	• Deskana
	Jul 1 2015, 5:22 PM

Description

Q1 2015-16 goal: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals/SearchGoal

This task needs to be broken down into smaller steps that can be taken.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		• Deskana	T104466 EPIC: Cut zero results rate for search in half.
		Invalid		None	T104468 EPIC: Improve spelling error detection and correction for search to fulfil Q1 2015-16 quarterly goal.

Event Timeline

• Deskana created this task.Jul 1 2015, 5:22 PM

• Deskana raised the priority of this task from to Medium.

• Deskana updated the task description. (Show Details)

• Deskana added projects: Discovery-ARCHIVED, Epic.

• Deskana subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 1 2015, 5:22 PM

• Deskana moved this task from Needs triage to Product Epics on the Discovery-ARCHIVED board.Jul 1 2015, 5:22 PM

• Deskana added a parent task: T104466: EPIC: Cut zero results rate for search in half..

• Deskana updated the task description. (Show Details)Jul 1 2015, 5:24 PM

• Deskana set Security to None.

At a cursory glance, it appears this is a request to use the ES suggest api which can look at a query and suggest words which are "nearby" to replace words in the query. Querying this api on an enwiki_general for "saerch" with no special options (those to be determined) will return words like search, smerch, saetch, and serch. Each of these will report their frequency within the corpus (ex. search is seen 179439 times, while saetch is seen 7 times) and a score. This is currently exposed at [1]. Sadly that faulty search for saerch returned saeqeh.

[1] https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=saerch&fulltext=Search

There are plenty of knobs to turn in this stuff, from decisions like which fields to consider (content, title, etc), how to weight those fields, to what kind of results do we want (only suggest for missing terms, only suggest things more popular than the provided term, etc).

I think the simplest way to get started here would be to:

Attach some analytics to the existing Did you mean feature to see if the suggestions are any good.
Generate some alternative suggestion configurations and back-test against existing no-result queries.
We need some sort of A/B testing to try out different suggestion configurations in prod.

Moving beyond that:

Determine some sort of cutoff where we think the suggestion is much better than the original query, and re-run the query with the suggested term returning those results instead.
Have to give the user some button to click that says "no, really search for what i typed"
Also needs analytics to see how often users reject our re-written query

We might also want to move beyond Special:Search and also offer this type of rewrite for prefix searches (top right box), but we should probably start with just Special:Search.

Ricordisamoa subscribed.Jul 1 2015, 7:25 PM

Dug through this a bit more today, I think the tasks could be as follows:

Properly report suggestions to CirrusSearchRequests

Grepping through 35k no-result searches via web only 7 were reported as having suggestions.
Some testing shows that not all things that result in suggestions also get tagged in this log
It should report every time suggestions are requested, even if none found

Build out feature to run suggested query when no results found

Should be pretty straight forward.
A good portion of the queries have zero results, we might want to start with this throttled back to only run on a % of zero-result queries until we understand the performance trade offs.

Define and implement analytics for search misspellings

Need to be able to measure our progress in this area.
Initial goal is to target result sets with 0 matches.
Could be part of a larger schema to measure our handling of zero-result queries in general.
We need to know if users that follow a 'did you mean' query, manually or being internally forwarded, are satisfied with the results
Needs to handle A/B testing different suggestion configurations
might be second ticket for implementation details?

Extract some number of no result queries from production logs and use them to test alternative suggestion configurations to be A/B tested

elasticsearch has many options for suggestions, we should be able to come up with a few alternatives to try
maybe context-sensitive config? For example if query has no results and no suggestions, could a less performant configuration be run to attempt to generate suggestions?
???
???
Profit

Investigate performance concerns around suggestion changes

What kind of impact is there attaching multiple suggest queries per search with different options
- Perhaps a suggestion config that does best for misspellings doesn't do best in all cases, could we run multiple?
What performance impact will in-process re-running 0 result queries with their suggested replacement have?
What performance impact do the A/B tested suggestion configurations have
It looks like CirrusSearch in production is only suggesting against titles. Would expanding this to include an excerpt of the article improve results? What perf impact would that have?

All of @EBernhardson's notes are perfect.

I think we should experiment with running suggestions from article text as well. What we have is certainly flawed.

I like the idea automatically doing a suggested query based on some threshold - no results found is a good, obvious threshold. I think we should create a url parameter we can stick on the page to disable that behavior because some folks don't want it in some contexts. We probably also don't want it if there is special syntax in the query - queries like foo* and insource:cat shouldn't run suggestions but queries like help:i can haz docs should. Or something like that.

And we'll need some analytics around it for it to make sense.

Yeah, good suggestions.

I think that all @EBernhardson ideas are very good.
In the same vein, I'll add that we can investigate better suggestions in the search-as-you-type feature. If we can detect misspellings earlier it could help to achieve the Q1 goals.
I've made a quick experiment with the simple wiki data from beta, if we have time I can present it to you during the mind melt meeting on Monday.

I dig all of the above, however I have noticed in roughly every single meeting that we're severely inhibited by our developer flow. I recommend we take some time to streamline our software development process, automate as much as possible, and head ever toward continuous delivery.

In T104468#1430589, @Jdouglas wrote:

I dig all of the above, however I have noticed in roughly every single meeting that we're severely inhibited by our developer flow. I recommend we take some time to streamline our software development process, automate as much as possible, and head ever toward continuous delivery.

I probably swing way way way too far towards the, "I'll tough it out and run these 30 commands to deploy things." I've seen far worse. Far far worse to me is our stormy relationship with ops and how we feel blocked on operations to deploy stuff. There isn't a clear way to work with ops. There used to be, the ops liaison thing, but it isn't working for us.

I think the browser test bot has been really nice for us, btw.

Far far worse to me is our stormy relationship with ops and how we feel blocked on operations to deploy stuff. There isn't a clear way to work with ops. There used to be, the ops liaison thing, but it isn't working for us.

Actually, I think it is now working well. If anyone is aware of problems with our ops interface, please let me (and/or Giuseppe) know.

blocked on operations to deploy stuff

This is a good way to summarize my point.

blocked on operations to deploy stuff

This is a good way to summarize my point.

Phab probably isn't the ideal medium for this discussion, and especially not this particular ticket. So for now I'll just say "I disagree", and we can pick up the conversation elsewhere.

TJones subscribed.Jul 22 2015, 3:24 PM

Very stale task. Those that are interested in what's happening in this area may wish to review T121616.

EPIC: Improve spelling error detection and correction for search to fulfil Q1 2015-16 quarterly goal.Closed, InvalidPublicActions