Page MenuHomePhabricator

[FY25/26-WE3.1.7] Literature Review on Search on-wiki and off-wiki
Closed, ResolvedPublic

Description

Hypothesis informing the work on semantic search (WE3.1.6, WE3.1.8, etc): If we review existing research on how readers interact with search and navigation tools on Wikipedia, and how they use external search to find knowledge on Wikipedia, we will be able to provide the Reader teams with ≥3 actionable recommendations and findings that help them scope a search and discovery MVP to address gaps in reader expectations and needs.

  • Summarize findings about how readers use external search to reach Wikipedia
  • Summarize findings about how readers use search on Wikipedia
    • Estimate number/fraction of queries using natural language vs keywords (T404822)
    • include known barriers/limitations users face with current search
  • Provide 3 or more actionable recommendations

Details

Due Date
Oct 16 2025, 11:00 PM

Event Timeline

Miriam set Due Date to Oct 16 2025, 11:00 PM.Sep 17 2025, 1:13 PM
Miriam added subscribers: JTannerWMF, SuchetaG.

weekly update:

  • started scoping the work for this hypothesis
  • Collected relevant resources/literature for the review of on- and off-wiki search
  • Started analysis of search queries to estimate fraction of natural language queries T404822
    • defining a simple-to-implement heuristic for what a natural language query is. one crucial criterion is to check whether query contains any question words via the following regex: \b(who|what|where|when|why|how)\b
    • Identifying the best datas-source to get all full text queries (e.g. using webrequest-table instead of discovery.query_clicks_hourly to also get queries from mobile web)
MGerlach renamed this task from [FY25/26-WE3.1.7] Literature Review on Search on-wiki and off-wiki to [FY25/26-WE3.1.7] Literature Review on Search on-wiki and off-wiki.Sep 26 2025, 8:53 AM
MGerlach updated the task description. (Show Details)

weekly update:

  • Put together high-level statistics of use of search on Wikipedia
  • Summarized known pain points of WP's search and identified themes: preference of external search out of habit (e.g. for navigating between articles), lack of understanding of how it works (e.g. lack of match in autocomplete is interpreted as absence of coverage), UI limitations in arriving to/using fulltext search, community wishlists (template discovery, common queries by newcomers, discussion thread), low recall for long queries (not necessarily natural language queries), difficulties of media search on commons, unmet expectations of readers to find information using natural language queries or within sections.
  • First estimate for fraction of natural language queries in fulltext search on Wikipedia (4-7%) T404822

Next steps:

  • Summarizing previous analysis of abandoned queries in Wikipedia
  • Summarizing insights about use of external search to arrive at Wikipedia

weekly update:

  • closed subtask on estimating the fraction of natural language queries on WP search
  • summarized insights about use of external search to reach/navigate Wikipedia
  • with this, I have compiled a rough full first draft of the review
  • currently asking for feedback and incorporating changes from Design Research and Search Team as well as polishing the text
  • Next: writing high-level summary with specific recommdantions

weekly update:

  • Incorporated feedback from Search Team and Design Research
  • Summarized main findings and formulated a set of recommendations
  • Finalized full first draft (internal doc)
  • Next step: share more widely

weekly update:

  • shared draft more widely and incorporated feedback
  • closing task as work is completed