Page MenuHomePhabricator

Tune the perfield_builder_relaxed query builder profile
Closed, ResolvedPublic5 Estimated Story Points

Description

After enabling a relaxed profile for the search retrieval query we noticed that some results are particularly bad.

We believe the reason might be T405867 but checking some queries we see that this retrieval query can be way worse than a simple match on the all field when MLR is explicitly disabled.

It is probable that as part of the work in T139575 where the weights of this builder were tuned the strict AND did help a lot to discard bad results.
We should revisit these weights and improve the overall quality of this query.

AC:

  • tune the query components & weight of the perfield_builder_relaxed profile

Details

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel set the point value for this task to 5.Oct 6 2025, 3:51 PM

I used https://gitlab.wikimedia.org/dcausse/retrieval-tune to test several shape of queries to see if recall could be improved.
The idea is to use query clicks against a high number of hard negative samples and tune the various weights to increase recall.
I tested 4 variations of the query against the current baseline:

  • baseline_tuned: same shape as the baseline (use of several dismax blocks)
  • all_field_only: use only the all field
  • flat: take all the fields in a flat boolean SHOULD clause
  • flat_simple: use a limited number of fields (don't use all.*, suggest and all_near_match.*)

I trained these models against 10k random queries and 500 negative samples per query, with a 0.2 train/test split this represented at least 4,000,000 result pairs to train on (assuming the majority of queries has 1click).
The metrics used on the test dataset are:

  • mean average precision
  • recall at X (X in 10, 50, 100)
shapeMAPr@10r@50r@100
baseline0.2230.8990.9680.982
baseline tuned0.2410.9270.9760.987
flat0.2370.9300.9770.989
flat simple0.2470.9210.9740.987
all field only0.0620.8350.9540.974

Trained weights have been uploaded to a new relforge profile in CirrusSearch, they look generally reasonable but surprisingly incoming_links is heavily ignored preferring popularity_score. Slightly surprised by the poor performance of the all field alone as well...

Overall it seems that recall could be improved a bit from the non-tuned baseline. I may try to upload updated weights and keep the baseline query shape, it could not hurt much I guess.

Change #1196715 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Add retrieval profiles to relforge

https://gerrit.wikimedia.org/r/1196715

Change #1196715 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add retrieval profiles to relforge

https://gerrit.wikimedia.org/r/1196715