Q1 2015-16 goal: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals/SearchGoal
This task needs to be broken down into smaller steps that can be taken.
Q1 2015-16 goal: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals/SearchGoal
This task needs to be broken down into smaller steps that can be taken.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Deskana | T104466 EPIC: Cut zero results rate for search in half. | |||
Invalid | None | T104468 EPIC: Improve spelling error detection and correction for search to fulfil Q1 2015-16 quarterly goal. |
At a cursory glance, it appears this is a request to use the ES suggest api which can look at a query and suggest words which are "nearby" to replace words in the query. Querying this api on an enwiki_general for "saerch" with no special options (those to be determined) will return words like search, smerch, saetch, and serch. Each of these will report their frequency within the corpus (ex. search is seen 179439 times, while saetch is seen 7 times) and a score. This is currently exposed at [1]. Sadly that faulty search for saerch returned saeqeh.
There are plenty of knobs to turn in this stuff, from decisions like which fields to consider (content, title, etc), how to weight those fields, to what kind of results do we want (only suggest for missing terms, only suggest things more popular than the provided term, etc).
I think the simplest way to get started here would be to:
Moving beyond that:
We might also want to move beyond Special:Search and also offer this type of rewrite for prefix searches (top right box), but we should probably start with just Special:Search.
Dug through this a bit more today, I think the tasks could be as follows:
All of @EBernhardson's notes are perfect.
I think we should experiment with running suggestions from article text as well. What we have is certainly flawed.
I like the idea automatically doing a suggested query based on some threshold - no results found is a good, obvious threshold. I think we should create a url parameter we can stick on the page to disable that behavior because some folks don't want it in some contexts. We probably also don't want it if there is special syntax in the query - queries like foo* and insource:cat shouldn't run suggestions but queries like help:i can haz docs should. Or something like that.
And we'll need some analytics around it for it to make sense.
Yeah, good suggestions.
I think that all @EBernhardson ideas are very good.
In the same vein, I'll add that we can investigate better suggestions in the search-as-you-type feature. If we can detect misspellings earlier it could help to achieve the Q1 goals.
I've made a quick experiment with the simple wiki data from beta, if we have time I can present it to you during the mind melt meeting on Monday.
I dig all of the above, however I have noticed in roughly every single meeting that we're severely inhibited by our developer flow. I recommend we take some time to streamline our software development process, automate as much as possible, and head ever toward continuous delivery.
I probably swing way way way too far towards the, "I'll tough it out and run these 30 commands to deploy things." I've seen far worse. Far far worse to me is our stormy relationship with ops and how we feel blocked on operations to deploy stuff. There isn't a clear way to work with ops. There used to be, the ops liaison thing, but it isn't working for us.
I think the browser test bot has been really nice for us, btw.
Far far worse to me is our stormy relationship with ops and how we feel blocked on operations to deploy stuff. There isn't a clear way to work with ops. There used to be, the ops liaison thing, but it isn't working for us.
Actually, I think it is now working well. If anyone is aware of problems with our ops interface, please let me (and/or Giuseppe) know.
blocked on operations to deploy stuff
This is a good way to summarize my point.
Phab probably isn't the ideal medium for this discussion, and especially not this particular ticket. So for now I'll just say "I disagree", and we can pick up the conversation elsewhere.
Very stale task. Those that are interested in what's happening in this area may wish to review T121616.