Page MenuHomePhabricator

Reconsider normalizeFulltextScores implementation
Closed, ResolvedPublic

Description

There's a discrepancy in the scores of fulltext matches vs depicts matches, and that discrepancy grows along with the number of search terms.
After gathering data of hundreds of searches, we found that *on average*, scores of fulltext matches would grow ±1.25 when there's more than 1 word.
Statements are always just singular so they're not affected in a similar way.
We ended up normalizing such increased fulltext scores back down by a factor of 1.25 to bring them to a similar baseline as statement matches.

That said, the above logic is very complex (and requires multiple hacks to even pull it off on the elastic version & config we're running) and we're not sure how valuable it is:

  • the 1.25 average probably no longer holds up after we've made a bunch of other changes (e.g. new boost & score calcs based on logistic regressions)
  • 1.25 was an average of a massive set of differences, to the point where we're not even sure it even makes any difference (good or bad) for the majority of searches

Now that we have some metrics, we can figure out whether the above implementation continues to make a difference.

  • If not: we can simply get rid of the normalizeFulltextScores pathway altogether
  • If it does (positive or negative), we can gather a larger sample based on current media search scoring, and refine the implementation.

Steps:

  • Gather baseline metrics (with no changes to how mediasearch documents are scored) for a couple of weeks
  • Disable normalizeFulltextScores
  • Gather metrics for another week or so
  • If no noticeable change: remove normalizeFulltextScores code
  • If noticeable change: create new ticket to investigate improving normalizeFulltextScores implementation

Event Timeline

Change 742466 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/extensions/WikibaseMediaInfo@master] Disable FT score normalization

https://gerrit.wikimedia.org/r/742466

Change 742466 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Disable FT score normalization

https://gerrit.wikimedia.org/r/742466

Moving to blocked.
This is supposed to hit prod on Dec 15 & we'll need at least about a week's worth of data to see whether this affected any metrics at all.

It seems that metrics have gone bananas (probably related to T297400) so this will remain stalled until they're stable again.

Change 759240 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[operations/mediawiki-config@master] [WikibaseMediaInfo] Stop normalizing full text scores

https://gerrit.wikimedia.org/r/759240

Change 759451 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/extensions/WikibaseMediaInfo@master] Remove fulltext score normalization

https://gerrit.wikimedia.org/r/759451

This was accidentally re-enabled while making another change. I'll disabled it again & will monitor metrics in the next couple of days, but a test run against our existing labeled data predicts it will have no noticeable effect. Patch for full removal also in CR.

Change 759240 merged by jenkins-bot:

[operations/mediawiki-config@master] [WikibaseMediaInfo] Stop normalizing full text scores

https://gerrit.wikimedia.org/r/759240

Mentioned in SAL (#wikimedia-operations) [2022-02-03T12:09:53Z] <mlitn@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:759240|[WikibaseMediaInfo] Stop normalizing full text scores (T296631)]] (duration: 00m 52s)

Disabling looks to have had no measurable effect. This code is safe to delete.

Change 759451 merged by jenkins-bot:

[mediawiki/extensions/WikibaseMediaInfo@master] Remove fulltext score normalization

https://gerrit.wikimedia.org/r/759451