Page MenuHomePhabricator

Ensure that zero-results rate progresses according to plan
Closed, ResolvedPublic

Description

  • Depends-on: T105180
  • Summary: Write a test that takes a sample set of searches from the logs, throws them at CirrusSearch, and checks that the zero-results rate satisfies the target schedule.
  • Benefit: Keeps engineering efforts on track.

Event Timeline

Jdouglas raised the priority of this task from to High.
Jdouglas updated the task description. (Show Details)
Jdouglas subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Been thinking about this, and looking through how our dashboards collect and report on data.

It seems like the end goal of this task should be to have a graph on searchdata.wmflabs.org that shows our goal line, and our progress in this task? The most natural thing to show on the dashboard though would be the rate of zero-results queries actually delivered to users.

Due to the variety of wiki's/languages and the size of the corpus I think the only way we can get an accurate representation of the # of zero-results to expect is to run the queries on the production cluster. To guarantee we are getting the same results as the web interface, we should probably just call the actual search endpoint (html, not api) and let it do its thing. Will need to ensure that these queries don't throw off our other reports.

So, to summarize it sounds like the steps to finish this task are

A) A list of no-result search terms needs to be made available for the server running searchdata.wmflabs.org
B) A very simple script needs to iterate that list, issue the queries to the production wikis, and record yes/no for the results (can probably just sum up in-process). It then needs to write this number down somewhere. Likely this script can just be added to cron.daily
C) The dashboard application needs to be adjusted to report the information stored by the script in step B
D) (optional) The queries need an extra query-param that is included in the CirrusSearchRequests log so they can be filtered out of the regular search stats.

part there is that the ticket is to run a 'sample set' of searches, rather than whatever

This script will be very useful for other tasks also. I think we could use it to do the evaluation explained in T104505.
I think it would be nice if this script is able to run the query exactly the same way than the original query :

  • prefix search
  • regular search
  • others?
Deskana subscribed.

I believe that this is satisfied by the "Zero result rate change, by day" graph on the dashboard: http://searchdata.wmflabs.org/metrics/#failure_rate