After enabling a relaxed profile for the search retrieval query we noticed that some results are particularly bad.
We believe that one of the reasons is that MLR never saw such results during training and that now some signal might not be as strong as before, for instance:
- a single match to the title was a strong signal previously while it's no longer the case if some important terms of the query are not matched anywhere else
We could try to force the model to account for those new results by mining negative samples:
- random negatives: likely to be actually bad but possibly too easy for the model to discard
- hard negatives: mining extra results using IR technique (i.e. a bm25 query on the all field)
Open questions:
- how many should we pull? (5 negatives per clicked result?)
- where should they be placed initially (randomly assign a position? interleaved so that they get closer to the top?)
AC:
- mjolnir is able to mine negative samples
- a new model is trained using this technique and uploaded to production for testing