Page MenuHomePhabricator

Improve provenance tracking of CirrusSearch requests
Open, MediumPublic

Description

CirrusSearch can be used in various ways to provide ranked list of pages.
To identify some of the use-cases we rely on crude heuristics that are often error prone and hard to replicate.

It could be interesting to see whether it is possible to have a more explicit way to track the various use-cases where search is involved.
The benefits could be:

  • have a better understanding of how the various features are using the search APIs
  • possibly let the engine tune the profiles based on a provenance, some users sometimes have to explicitly select rescore profiles, knowing the provenance the engine could select the best profiles automatically based on a centralized config
  • possibly be more "generous" in some ways for specific use-cases by relaxing some of the limits imposed by the search APIs

The way to achieve this is yet unclear but the main idea would be to propagate a tag identifying the feature all the way down to CirrusSearch and all the search log events. It has some similarities with wprov.

Knowing the features that are using search might take time but here is a quick list of possible candidates to get an idea:

AC:

  • design how this can be done, from the backend SearchEngine up to the UI
  • implement the logic
  • target a couple use-cases and implement them
  • seek other use-cases and promote the approach

Event Timeline

Gehel triaged this task as Medium priority.Apr 14 2025, 3:27 PM
Gehel moved this task from needs triage to Next Projects on the Discovery-Search board.

As a first step: Let's hear API and MW Platform/Interface ideas on this (@Tgr ?)

I think this is more relevant to MW Interfaces than MW Platform (so maybe @daniel?). Although I'd probably avoid using the same parameter for provenance and tuning.

Other than wprov, the other provenance features I'm aware of are campaigns for login and for file uploads, via MediaWiki-extensions-Campaigns and UploadWizard, but I don't think they have much in common with this task.