- Depends-on: T105180
- Summary: Write a test that takes a sample set of searches from the logs, throws them at CirrusSearch, and checks that the zero-results rate satisfies the target schedule.
- Benefit: Keeps engineering efforts on track.
|T105182 Ensure that zero-results rate progresses according to plan
|T106691 Write a script that send queries to the production cluster and collect the results
Been thinking about this, and looking through how our dashboards collect and report on data.
It seems like the end goal of this task should be to have a graph on searchdata.wmflabs.org that shows our goal line, and our progress in this task? The most natural thing to show on the dashboard though would be the rate of zero-results queries actually delivered to users.
Due to the variety of wiki's/languages and the size of the corpus I think the only way we can get an accurate representation of the # of zero-results to expect is to run the queries on the production cluster. To guarantee we are getting the same results as the web interface, we should probably just call the actual search endpoint (html, not api) and let it do its thing. Will need to ensure that these queries don't throw off our other reports.
So, to summarize it sounds like the steps to finish this task are
A) A list of no-result search terms needs to be made available for the server running searchdata.wmflabs.org
B) A very simple script needs to iterate that list, issue the queries to the production wikis, and record yes/no for the results (can probably just sum up in-process). It then needs to write this number down somewhere. Likely this script can just be added to cron.daily
C) The dashboard application needs to be adjusted to report the information stored by the script in step B
D) (optional) The queries need an extra query-param that is included in the CirrusSearchRequests log so they can be filtered out of the regular search stats.
part there is that the ticket is to run a 'sample set' of searches, rather than whatever