Page MenuHomePhabricator

[SPIKE] Can the results of abandoned query analysis be used to improve MLR performance?
Closed, ResolvedPublic8 Estimated Story Points

Description

This is a follow up task to T383048: Investigate current MLR models for Search and identify improvements.

Once we obtain a better understanding of abandoned query logs, we should investigate whether they can be incorporated into MLR models to improve their performance.

Event Timeline

gmodena renamed this task from Investigate abandoned queries and identify eventual model improvments to [SPIKE][NEEDS GROOMING] Investigate abandoned queries and identify eventual model improvments.Mar 3 2025, 4:34 PM
Gehel set the point value for this task to 8.Mar 3 2025, 4:34 PM
gmodena renamed this task from [SPIKE][NEEDS GROOMING] Investigate abandoned queries and identify eventual model improvments to [SPIKE] Can the results of abandoned query analysis be used to improve MLR performance?.Mar 4 2025, 8:57 AM
gmodena updated the task description. (Show Details)

Our assumption is that we can improve Learning to Rank (LTR) model training by using abandoned queries data.
I've started to collect some references and ideas at https://wikitech.wikimedia.org/wiki/User:GModena_(WMF)/Notes/MLR

Some approaches we could investigate include:

  • Abandoned queries as negative examples: the idea is to incorporate abandoned queries as negative training signals in mjolnir, indicating rankings that failed to satisfy user intent.
  • Dwell time and abandonment features: extract features from abandonment patterns (quick abandonment vs. abandonment after viewing results) to differentiate between different types of negative feedback.
  • Query reformulation analysis: we could study how users modify abandoned queries to better understand query intent and potential ranking failures.
  • Contextual abandonment modeling: can we use session context when using abandonment signals, to estimate the reason for abandonment?
  • Multi-objective optimization: the idea is to balance traditional relevance metrics (ndcg) with abandonment reduction goals in mjonlir taining objective function.

Depending on how far we want to go, we could consider also consider Counterfactual Learning in conjunction with abandonment data. The idea here is that abandoned queries provide implicit negative feedback
that standard supervised learning approaches might not properly utilize. Counterfactual learning could help address this by allowing us to estimate "what would have happened if a different ranking had been shown".

Gehel triaged this task as Medium priority.Mar 17 2025, 3:30 PM

After classification on T375554, it seems that there isn't any ideas left on how to use abandoned queries to improve MLR training.

Gehel changed the task status from Declined to Resolved.May 23 2025, 12:18 PM