re-run wbsearchentities optimization process
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EBernhardson
	Apr 21 2022, 4:18 PM

Description

To support elasticsearch 7 the scoring equation for wbsearchentities needs some small shape changes. The weights we use in this search came from relforge_wbsearchentities. The process was last used on elasticserach 5.5, likely some changes will be necessary to get it up and running against 6.8. These reports can be run against the current equation and not the updated one, the goal of having tuning reports is to know that the full process is working and runnable again.

AC: Tuning reports, including weights to deploy to prod, for all languages that have custom weights already deployed

Details

Subject	Repo	Branch	Lines +/-
Revert "cirrus: Turn on AB test of wbsearchentities profiles"	operations/mediawiki-config	master	+1 -1
cirrus: Turn on AB test of wbsearchentities profiles	operations/mediawiki-config	master	+1 -1
Add wbsearchentities profiles for testing	operations/mediawiki-config	master	+170 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T248925 Make MediaWiki release tarball compatible with PHP 8.0
Resolved	Jdforrester-WMF	T300463 Make PHP 8.0 voting on MW master
Resolved	None	T283275 Make MW master tests pass on PHP 8.0
Resolved	Reedy	T268861 CirrusSearch uses Elastica's Match class
Resolved	Reedy	T268863 Translate uses Elastica's Match class
Resolved	matthiasmullie	T268866 WikibaseMediaInfo uses Elastica's Match class
Invalid	None	T268864 WikibaseCirrusSearch uses Elastica's Match class
Resolved	Reedy	T268865 WikibaseLexemeCirrusSearch uses Elastica's Match class
Resolved	EBernhardson	T271777 Bump rufin/elastica (and related libraries) to versions that support PHP 8.0
Resolved	Gehel	T263142 [EPIC] Upgrade Elasticsearch to version 7.10
Resolved	• EJoseph	T209859 Wikidata autocomplete (wbsearchentities) results with score <= 0
Resolved	EBernhardson	T306644 re-run wbsearchentities optimization process

Event Timeline

EBernhardson created this task.Apr 21 2022, 4:18 PM

EBernhardson moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.Apr 25 2022, 3:48 PM

Reports generated and published: https://people.wikimedia.org/~ebernhardson/wbsearchentities_202203

Few ideas for future exploration:

Lots of the weights in the tuning report claim to have minimal influence on the final output, look into why. Do we need to collect more negative samples in the training set? Are the features useless?

Could be interesting to generate the sensitivity portion of the report against current production deployed values.

The improvement levels are surprisingly similar to before, perhaps suspisously so. Would also be interesting to re-run the optimization process after deploying the new values. If training with the optimized values as the comparison we should see little if any improvement. If it still shows significant improvements there could be errors in the reporting.

Change 786347 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] Add wbsearchentities profiles for testing

https://gerrit.wikimedia.org/r/786347

gerritbot added a project: Patch-For-Review.Apr 26 2022, 4:36 PM

Change 786347 merged by jenkins-bot:

[operations/mediawiki-config@master] Add wbsearchentities profiles for testing

https://gerrit.wikimedia.org/r/786347

Mentioned in SAL (#wikimedia-operations) [2022-04-26T20:11:27Z] <urbanecm@deploy1002> Synchronized wmf-config/: 9805e61f7006edf45199a3e22494945bffaaeb4d: Add wbsearchentities profiles for testing (T306644) (duration: 00m 53s)

Maintenance_bot removed a project: Patch-For-Review.Apr 26 2022, 8:30 PM

Profiles are deployed, they can be enabled for testing in a single page with a magic query string like wikidataCompletionSearchClicksBucket=T306644-fr. Next steps would be to turn the test on, and set the turn-off date. Previously we did two weeks, I don't remember what went into that decision but running this for two weeks seems plausible as well.

Should we inform anyone at wikidata that we will be turning on the test? Who?

• MPhamWMF added a parent task: T263142: [EPIC] Upgrade Elasticsearch to version 7.10.Apr 26 2022, 9:19 PM

Change 787069 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Turn on AB test of wbsearchentities profiles

https://gerrit.wikimedia.org/r/787069

gerritbot added a project: Patch-For-Review.Apr 27 2022, 8:54 PM

Change 787069 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Turn on AB test of wbsearchentities profiles

https://gerrit.wikimedia.org/r/787069

Mentioned in SAL (#wikimedia-operations) [2022-04-27T21:01:01Z] <ebernhardson@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:787069|cirrus: Turn on AB test of wbsearchentities profiles (T306644)]] (duration: 00m 53s)

Maintenance_bot removed a project: Patch-For-Review.Apr 27 2022, 9:30 PM

Ran the previous AB testing report to get a preliminary look at the data and ensure it's collecting as expected. Everything seems reasonable, the new tuning isn't clearly better but not clearly worse either and we only have a few hundred events for most languages. As stated previously intending to run for two weeks, ending data collection on May 11.

EBernhardson moved this task from In Progress to Waiting on the Discovery-Search (Current work) board.Apr 29 2022, 8:14 PM

Change 792141 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] Revert "cirrus: Turn on AB test of wbsearchentities profiles"

https://gerrit.wikimedia.org/r/792141

gerritbot added a project: Patch-For-Review.May 16 2022, 6:23 PM

Change 792141 merged by jenkins-bot:

[operations/mediawiki-config@master] Revert "cirrus: Turn on AB test of wbsearchentities profiles"

https://gerrit.wikimedia.org/r/792141

Mentioned in SAL (#wikimedia-operations) [2022-05-16T20:41:31Z] <catrope@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:792141|Revert "cirrus: Turn on AB test of wbsearchentities profiles" (T306644)]] (duration: 00m 51s)

Maintenance_bot removed a project: Patch-For-Review.May 16 2022, 9:30 PM

Reports found in https://people.wikimedia.org/~ebernhardson/T306644/

Summary is that the tuning is either the same or slightly worse almost everywhere. Unclear currently where things went wrong. It's not significantly worse so the process is still coming up with reasonable values, but those reasonable values aren't resulting in better ranking than the tuning from a few years ago.

The situation does not degrade significantly, this allows us to upgrade to ES7, let's move forward.

EBernhardson moved this task from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.Jun 9 2022, 2:44 PM

Gehel closed this task as Resolved.Jul 25 2022, 2:18 PM

re-run wbsearchentities optimization process Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

re-run wbsearchentities optimization process
Closed, ResolvedPublic
Actions

Related Objects
Search...