Page MenuHomePhabricator

AB Test doubling near match field weights on commonswiki
Open, Needs TriagePublic3 Estimated Story Points

Description

via T406020#11299905:

Based on this test it seems we should setup a dedicated search profile for commonswiki with the doubled near match weight and run an AB test. There is already an AB test that is about to start (related to completion suggester) and our infra can only run one at a time, so this might be delayed a week or two.

Event Timeline

pfischer set the point value for this task to 3.Oct 27 2025, 4:35 PM

Change #1199054 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Start near match A/B test

https://gerrit.wikimedia.org/r/1199054

Change #1199054 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Start near match A/B test

https://gerrit.wikimedia.org/r/1199054

Mentioned in SAL (#wikimedia-operations) [2025-11-04T21:20:51Z] <bvibber@deploy2002> Started scap sync-world: Backport for [[gerrit:1199054|cirrus: Start near match A/B test (T408154)]]

Mentioned in SAL (#wikimedia-operations) [2025-11-04T21:24:02Z] <bvibber@deploy2002> bvibber, ebernhardson: Backport for [[gerrit:1199054|cirrus: Start near match A/B test (T408154)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-11-04T21:28:45Z] <bvibber@deploy2002> Finished scap sync-world: Backport for [[gerrit:1199054|cirrus: Start near match A/B test (T408154)]] (duration: 07m 53s)

Test has been out for a week, ran the notebook but results are curious. In particular we are seeing a significant change in ZRR, even though the test treatment does not change the retrieval function. This suggests we could have some unbalanced effects in the bucketing. It seems worthwhile to let the test continue running for a second week and run the notebook against the second week to verify the results.

The overall metrics suggest an improvement in satisfaction and clickthrough rates with the test treatment, but we will want to see if the second week holds the same results.

It seems worthwhile to let the test continue running for a second week

@EBernhardson it's now two weeks later. Just in case this dropped off the radar by accident.

It seems worthwhile to let the test continue running for a second week

@EBernhardson it's now two weeks later. Just in case this dropped off the radar by accident.

My appologies, the did indeed fall off the priority list for a bit. I've come back around to this today and had time to run the report and write up a conclusion. Things are looking great.

report: https://people.wikimedia.org/~ebernhardson/T408154-AB-Test-Metrics-Commonswiki-Near-Match-2x.html


Summary and Conclusions

This AB Test evaluated the impact of doubling the ranking weight of nearly-exact matches in full-text search on Wikimedia Commons. The primary aim was to enhance the user experience by ensuring that users see nearly exact matches from multiple namespaces even if the namespace or article is typically de-ranked. All statistical significance determinations use 95% confidence intervals.

Key Findings:

Engagement Rate: The new configuration demonstrated a statistically significant increase in the engagement rate from 28.4% to 29.7% on a per-query basis, and from 35.0% to 36.6% on a per-session basis. This suggests the test configuration provides results that are more attractive to click on.

Satisfied Rate: The new configuration similarly demonstrated a statistically significant increase in user satisfaction. The rate increased from 15.6% to 16.5% on a per-search basis, and from 20.5% to 21.5% on a per-session basis. This suggests the articles clicked on in the test configuration are doing a better job of satisfying user intent.

Result Ranking (Click Position): We observed a shift in clicks from lower positions to higher positions with the test configuration. Clicks@1 increased from 44.5% of first clicks to 47.6% of first clicks. Similarly clicks@4+ decreased from 33.3% of first clicks to 30.0% of first clicks. This shows the test configuration is doing a better job of placing the attractive results at the top of the result list.

Article Interaction: Article dwell time had a statistically significant increase from 25.3 seconds to 27.6 seconds. Proportion of clickthroughs that scroll the article page similarly showed a statistically significant increase from 40.3% to 42.7%. Both of these demonstrate increased user interaction with the pages they click through to, suggesting the test treatment is an improvement over control.

Session Length: Mean session length showed a statistically significant decrease from 47.2 seconds to 42.7 seconds. Note that session length does not include dwell, this is the time from the first to last search interaction in a session. We understand the decrease in session length, when combined with improvements in engagement and satisfaction, as evidence that users are finding their desired results more efficiently.

Conclusion: The test treatment showed either improvement or no-change across all metrics. This change should be rolled out to all users of commonswiki.

Change #1213559 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/mediawiki-config@master] cirrus: Apply increased near match weight on commonswiki

https://gerrit.wikimedia.org/r/1213559

Change #1213559 merged by jenkins-bot:

[operations/mediawiki-config@master] cirrus: Apply increased near match weight on commonswiki

https://gerrit.wikimedia.org/r/1213559

Mentioned in SAL (#wikimedia-operations) [2025-12-01T21:13:00Z] <cscott@deploy2002> Started scap sync-world: Backport for [[gerrit:1212670|Deploy Parsoid Read Views to 19 wikis (T411283)]], [[gerrit:1213497|Change the README to Markdown]], [[gerrit:1213498|noc: Point links in /conf to Gitiles rather than Differential]], [[gerrit:1213515|REST: enable the site.v1 module (T409516)]], [[gerrit:1213559|cirrus: Apply increased near match weight on commonswiki (T408154)]]

Mentioned in SAL (#wikimedia-operations) [2025-12-01T21:16:57Z] <cscott@deploy2002> cscott, ebernhardson, tgr, arlolra, bpirkle: Backport for [[gerrit:1212670|Deploy Parsoid Read Views to 19 wikis (T411283)]], [[gerrit:1213497|Change the README to Markdown]], [[gerrit:1213498|noc: Point links in /conf to Gitiles rather than Differential]], [[gerrit:1213515|REST: enable the site.v1 module (T409516)]], [[gerrit:1213559|cirrus: Apply increased near match weight on commonswiki (T408154

Mentioned in SAL (#wikimedia-operations) [2025-12-01T21:25:09Z] <cscott@deploy2002> Finished scap sync-world: Backport for [[gerrit:1212670|Deploy Parsoid Read Views to 19 wikis (T411283)]], [[gerrit:1213497|Change the README to Markdown]], [[gerrit:1213498|noc: Point links in /conf to Gitiles rather than Differential]], [[gerrit:1213515|REST: enable the site.v1 module (T409516)]], [[gerrit:1213559|cirrus: Apply increased near match weight on commonswiki (T408154)]] (duration: 12m