Page MenuHomePhabricator

[EPIC][Search][Dashboard] Add "well-behaved searchers" filter
Closed, DeclinedPublic

Description

While talking about and looking at the dashboards today, particularly the ZRR on the Azerbajani Wikipedia,[1] an idea that came up was that it would be nice to see key metrics for "well-behaved searchers".

This is a heuristic we use when gathering data for analysis and manual tagging. The idea is to exclude not only bots, but also weirdos like the Discovery Search team—who are known to issue hundreds of queries in a day without clicking on any results—and other outliers. The four elements we've used, not all of which may be relevant to dashboards, are:

  1. Query came from the search box on <wiki>.wikipedia.org
  2. Exclude any IP that made more than 30 queries per day
  3. Include not more than one query from any given IP for any given day
  4. Only the <wiki>_content index was searched (except for wikis that search multiple indexes by default)

(1) and (2) seem reasonable for dashboards. (3) limits the input of every individual searcher to one query—could be one session, or might not be appropriate at all for dashboards. (4) might be hard to implement generically because some wikis search multiple indices by default, and maintaining info on this across all projects sounds like a pain.

See more on the reasoning behind the constraints and @dcausse's sample HQL in the RelForge misc directory.


[1] From Oct 23-25, 2016, the ZRR on azwiki jumped from 30-35% to ~97% and has been holding steady since. @EBernhardson checked the database and it looks like there's probably a bot that or other API user that's driving up the ZRR.

Event Timeline

debt triaged this task as Medium priority.Nov 10 2016, 9:27 PM
debt moved this task from Needs triage to Up Next on the Discovery-Analysis board.
mpopov set the point value for this task to 6.
mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.
mpopov renamed this task from Add "well-behaved searchers" filter to dashboards to [EPIC][Search][Dashboard] Add "well-behaved searchers" filter.Nov 16 2016, 11:20 PM
mpopov removed the point value for this task.
mpopov added a project: Epic.
debt removed mpopov as the assignee of this task.Aug 30 2017, 11:17 PM

I'm not sure that we'll be able to get back to this type of work, because we now have a cool dashboard where we can look at things like the Azerbaijani wikipedia and their ZRR for the last 30-ish days:

azerbaijani-wiki-ZRR.png (636×1 px, 244 KB)

However, I'll keep this in the backlog for now—maybe we'll be able to pick it up again in the future to figure out if some of our own team queries (and other outliers) are skewing the overall metrics on searches.