Page MenuHomePhabricator

Implement a new fulltext query
Closed, ResolvedPublic

Description

While evaluating the allfield we found that the title is frequently underboosted. This is certainly due to the copy_to hack.
The copy_to hack allows us to impact the raw tf value, unfortunately it's not practical to do proper evaluation as we need to rebuild the index whenever we want to change the boost values.
We should experiment with various techniques to regain control on field boosts.
One idea could be to :

  • Keep the allfield as a primary filter for fast retrieval (a single field with stems and asciifolding no_preserve should be sufficient)
  • Remove the copy_to hack to save analysis time
  • If T128071 proves that the allfield is not appropriate for phrase rescore we should maybe drop positions on this field to save space (quid: what to do with quoted queries?)
  • Add a set of additionnal clauses to the query to boost some fields
  • Experiment with shingles on the titles thanks to the suggest field

Event Timeline

dcausse created this task.Feb 25 2016, 1:40 PM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptFeb 25 2016, 1:40 PM
Restricted Application added a project: Discovery. · View Herald TranscriptFeb 25 2016, 1:41 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptJul 7 2016, 9:51 AM
dcausse renamed this task from EPIC: experiment with a new fulltext query to Implement a new fulltext query.Jul 7 2016, 9:51 AM
dcausse removed a project: Epic.
dcausse updated the task description. (Show Details)Jul 7 2016, 5:49 PM
debt triaged this task as Normal priority.Jul 20 2016, 4:02 PM
debt moved this task from needs triage to This Quarter on the Discovery-Search board.

Change 302666 had a related patch set uploaded (by DCausse):
Implements a new fulltext query builder

https://gerrit.wikimedia.org/r/302666

Change 302666 merged by EBernhardson:
Implement a new fulltext query builder

https://gerrit.wikimedia.org/r/302666

debt closed this task as Resolved.Sep 1 2016, 8:56 PM