Page MenuHomePhabricator

re-run wbsearchentities optimization process
Closed, ResolvedPublic

Description

To support elasticsearch 7 the scoring equation for wbsearchentities needs some small shape changes. The weights we use in this search came from relforge_wbsearchentities. The process was last used on elasticserach 5.5, likely some changes will be necessary to get it up and running against 6.8. These reports can be run against the current equation and not the updated one, the goal of having tuning reports is to know that the full process is working and runnable again.

AC: Tuning reports, including weights to deploy to prod, for all languages that have custom weights already deployed

Event Timeline

Few ideas for future exploration:

  • Lots of the weights in the tuning report claim to have minimal influence on the final output, look into why. Do we need to collect more negative samples in the training set? Are the features useless?
  • Could be interesting to generate the sensitivity portion of the report against current production deployed values.
  • The improvement levels are surprisingly similar to before, perhaps suspisously so. Would also be interesting to re-run the optimization process after deploying the new values. If training with the optimized values as the comparison we should see little if any improvement. If it still shows significant improvements there could be errors in the reporting.

Change 786347 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] Add wbsearchentities profiles for testing

https://gerrit.wikimedia.org/r/786347

Change 786347 merged by jenkins-bot:

[operations/mediawiki-config@master] Add wbsearchentities profiles for testing

https://gerrit.wikimedia.org/r/786347

Mentioned in SAL (#wikimedia-operations) [2022-04-26T20:11:27Z] <urbanecm@deploy1002> Synchronized wmf-config/: 9805e61f7006edf45199a3e22494945bffaaeb4d: Add wbsearchentities profiles for testing (T306644) (duration: 00m 53s)

Profiles are deployed, they can be enabled for testing in a single page with a magic query string like wikidataCompletionSearchClicksBucket=T306644-fr. Next steps would be to turn the test on, and set the turn-off date. Previously we did two weeks, I don't remember what went into that decision but running this for two weeks seems plausible as well.

Should we inform anyone at wikidata that we will be turning on the test? Who?

Change 787069 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Turn on AB test of wbsearchentities profiles

https://gerrit.wikimedia.org/r/787069

Change 787069 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Turn on AB test of wbsearchentities profiles

https://gerrit.wikimedia.org/r/787069

Mentioned in SAL (#wikimedia-operations) [2022-04-27T21:01:01Z] <ebernhardson@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:787069|cirrus: Turn on AB test of wbsearchentities profiles (T306644)]] (duration: 00m 53s)

Ran the previous AB testing report to get a preliminary look at the data and ensure it's collecting as expected. Everything seems reasonable, the new tuning isn't clearly better but not clearly worse either and we only have a few hundred events for most languages. As stated previously intending to run for two weeks, ending data collection on May 11.

Change 792141 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] Revert "cirrus: Turn on AB test of wbsearchentities profiles"

https://gerrit.wikimedia.org/r/792141

Change 792141 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "cirrus: Turn on AB test of wbsearchentities profiles"

https://gerrit.wikimedia.org/r/792141

Mentioned in SAL (#wikimedia-operations) [2022-05-16T20:41:31Z] <catrope@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792141|Revert "cirrus: Turn on AB test of wbsearchentities profiles" (T306644)]] (duration: 00m 51s)

Reports found in https://people.wikimedia.org/~ebernhardson/T306644/

Summary is that the tuning is either the same or slightly worse almost everywhere. Unclear currently where things went wrong. It's not significantly worse so the process is still coming up with reasonable values, but those reasonable values aren't resulting in better ranking than the tuning from a few years ago.

Gehel subscribed.

The situation does not degrade significantly, this allows us to upgrade to ES7, let's move forward.