While evaluating the allfield we found that the title is frequently underboosted. This is certainly due to the copy_to hack.
The copy_to hack allows us to impact the raw tf value, unfortunately it's not practical to do proper evaluation as we need to rebuild the index whenever we want to change the boost values.
We should experiment with various techniques to regain control on field boosts.
One idea could be to :
- Keep the allfield as a primary filter for fast retrieval (a single field with stems and asciifolding no_preserve should be sufficient)
- Remove the copy_to hack to save analysis time
- If T128071 proves that the allfield is not appropriate for phrase rescore we should maybe drop positions on this field to save space (quid: what to do with quoted queries?)
- Add a set of additionnal clauses to the query to boost some fields
- Experiment with shingles on the titles thanks to the suggest field