There's a discrepancy in the scores of fulltext matches vs depicts matches, and that discrepancy grows along with the number of search terms.
After gathering data of hundreds of searches, we found that *on average*, scores of fulltext matches would grow ±1.25 when there's more than 1 word.
Statements are always just singular so they're not affected in a similar way.
We ended up normalizing such increased fulltext scores back down by a factor of 1.25 to bring them to a similar baseline as statement matches.
That said, the above logic is very complex (and requires multiple hacks to even pull it off on the elastic version & config we're running) and we're not sure how valuable it is:
- the 1.25 average probably no longer holds up after we've made a bunch of other changes (e.g. new boost & score calcs based on logistic regressions)
- 1.25 was an average of a massive set of differences, to the point where we're not even sure it even makes any difference (good or bad) for the majority of searches
Now that we have some metrics, we can figure out whether the above implementation continues to make a difference.
- If not: we can simply get rid of the normalizeFulltextScores pathway altogether
- If it does (positive or negative), we can gather a larger sample based on current media search scoring, and refine the implementation.
Steps:
- Gather baseline metrics (with no changes to how mediasearch documents are scored) for a couple of weeks
- Disable normalizeFulltextScores
- Gather metrics for another week or so
- If no noticeable change: remove normalizeFulltextScores code
- If noticeable change: create new ticket to investigate improving normalizeFulltextScores implementation