Page MenuHomePhabricator

CirrusSearch: Stopwords are not optional and are worth as much as exact matches
Open, HighPublic

Description

This will be caused by https://gerrit.wikimedia.org/r/#/c/108951/ which is a fix for bug 60302 and bug 54937. This seems less bad then not finding them at all given that they are stopwords, after all, so they should be on most pages any way.


Version: unspecified
Severity: normal

Details

Reference
bz60362

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 2:57 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz60362.
bzimport added a subscriber: Unknown Object (MLST).

The fix for this unfortunately requires me to make some changes to Elasticsearch and get them landed.

Looks like the Elasticsearch folks were already working on it and have got it lined up for 1.1. This one doesn't have an issue for some reason, just a pull request: https://github.com/elasticsearch/elasticsearch/pull/5005

Is this the reason why, if I look for an exact match of a sentence (by quoting it), I'm also provided with many results which don't contain my exact sentence but instead my sentence + stopwords in the middle of it?

We've had 1.1 for a bit now and we're looking at 1.2. Upstream issue seems closed & merged so can we consider this resolved (or possible to resolve) now?

(In reply to Chad H. from comment #4)

We've had 1.1 for a bit now and we're looking at 1.2. Upstream issue seems
closed & merged so can we consider this resolved (or possible to resolve)
now?

Nik: ?

This one takes longer to solve then just flipping a switch. IIRC we'd have to rewrite the query parser and I'm not sure we have the energy for that right now. I've removed the keywords but I'm sure its not the most important thing for us to be working on right now.

Restricted Application added a subscriber: Aklapper. · View Herald Transcript