Ensure that zero-results rate progresses according to plan
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Jdouglas
	Jul 8 2015, 4:17 PM

Description

Depends-on: T105180
Summary: Write a test that takes a sample set of searches from the logs, throws them at CirrusSearch, and checks that the zero-results rate satisfies the target schedule.
Benefit: Keeps engineering efforts on track.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		EBernhardson	T105182 Ensure that zero-results rate progresses according to plan
		Resolved		EBernhardson	T106691 Write a script that send queries to the production cluster and collect the results

Event Timeline

• Jdouglas created this task.Jul 8 2015, 4:17 PM

• Jdouglas raised the priority of this task from to High.

• Jdouglas updated the task description. (Show Details)

• Jdouglas added projects: CirrusSearch, Discovery-Search (Current work).

• Jdouglas subscribed.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptJul 8 2015, 4:17 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

EBernhardson claimed this task.Jul 10 2015, 6:08 PM

EBernhardson moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.

EBernhardson set Security to None.

Been thinking about this, and looking through how our dashboards collect and report on data.

It seems like the end goal of this task should be to have a graph on searchdata.wmflabs.org that shows our goal line, and our progress in this task? The most natural thing to show on the dashboard though would be the rate of zero-results queries actually delivered to users.

Due to the variety of wiki's/languages and the size of the corpus I think the only way we can get an accurate representation of the # of zero-results to expect is to run the queries on the production cluster. To guarantee we are getting the same results as the web interface, we should probably just call the actual search endpoint (html, not api) and let it do its thing. Will need to ensure that these queries don't throw off our other reports.

So, to summarize it sounds like the steps to finish this task are

A) A list of no-result search terms needs to be made available for the server running searchdata.wmflabs.org
B) A very simple script needs to iterate that list, issue the queries to the production wikis, and record yes/no for the results (can probably just sum up in-process). It then needs to write this number down somewhere. Likely this script can just be added to cron.daily
C) The dashboard application needs to be adjusted to report the information stored by the script in step B
D) (optional) The queries need an extra query-param that is included in the CirrusSearchRequests log so they can be filtered out of the regular search stats.

part there is that the ticket is to run a 'sample set' of searches, rather than whatever

This script will be very useful for other tasks also. I think we could use it to do the evaluation explained in T104505.
I think it would be nice if this script is able to run the query exactly the same way than the original query :