[EPIC][Search][Dashboard] Add "well-behaved searchers" filter
Open, NormalPublic


While talking about and looking at the dashboards today, particularly the ZRR on the Azerbajani Wikipedia,[1] an idea that came up was that it would be nice to see key metrics for "well-behaved searchers".

This is a heuristic we use when gathering data for analysis and manual tagging. The idea is to exclude not only bots, but also weirdos like the Discovery Search team—who are known to issue hundreds of queries in a day without clicking on any results—and other outliers. The four elements we've used, not all of which may be relevant to dashboards, are:

  1. Query came from the search box on <wiki>.wikipedia.org
  2. Exclude any IP that made more than 30 queries per day
  3. Include not more than one query from any given IP for any given day
  4. Only the <wiki>_content index was searched (except for wikis that search multiple indexes by default)

(1) and (2) seem reasonable for dashboards. (3) limits the input of every individual searcher to one query—could be one session, or might not be appropriate for at all for dashboards.
(4) might be hard to implement generically because some wikis search multiple indices by default, and maintaining info on this across all projects sounds like a pain.

See more on the reasoning behind the constraints and @dcausse's sample HQL in the RelForge misc directory.

[1] From Oct 23-25, 2016, the ZRR on azwiki jumped from 30-35% to ~97% and has been holding steady since. @EBernhardson checked the database and it looks like there's probably a bot that or other API user that's driving up the ZRR.

TJones created this task.Nov 9 2016, 8:53 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 9 2016, 8:53 PM
debt triaged this task as "Normal" priority.Nov 10 2016, 9:27 PM
debt moved this task from Needs triage to Up Next on the Discovery-Analysis board.
mpopov claimed this task.Nov 16 2016, 10:15 PM
mpopov set the point value for this task to 6.
mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.
mpopov removed the point value for this task.Nov 16 2016, 11:20 PM
mpopov changed the title from "Add "well-behaved searchers" filter to dashboards" to "[EPIC][Search][Dashboard] Add "well-behaved searchers" filter".
mpopov added a project: Epic.