Page MenuHomePhabricator

Update ZRR data collection to exclude irrelevant/invalid Cirrus requests
Closed, ResolvedPublic2 Estimated Story Points

Description

As noted in T131196#2200560, our data fetching scripts current treat a lot of requests inappropriately. We need to filter out irrelevant query_types (e.g. send_data_write). Specifically, we need to make the following changes:

query_typeHow it appears on dashboardchange or keep
comp_suggestPrefix Searchkeep as is
count_linksPrefix Search?
degraded_full_textFull-Text Search?
full_textFull-Text Searchkeep as is
GeoData_spatial_searchPrefix Searchexclude?
getPrefix Searchexclude?
more_likeFull-Text Searchkeep as is? maybe ignore those where payload['cached'] == true?
namespacePrefix Searchkeep as is
near_matchPrefix Searchkeep as is?
other_idx_lookupPrefix Search?
prefixPrefix Searchkeep as is
regexFull-Text Searchkeep as is
send_data_other_idx_writePrefix Searchexclude
send_data_writePrefix Searchexclude
send_deletesPrefix Searchexclude
versionPrefix Searchexclude

Additionally:

ebernhardson: bearloga: yea on looking, the most resilient way would probably be to treat anything with
  hitstotal: -1 (in hive you can use array_contains(requests.hitstotal, -1)) as "unknown", could have hits or
  not.  This looks like it would filter out index writes (send_data_write) and and cached more like

Event Timeline

mpopov set the point value for this task to 2.

@EBernhardson: Can you advise on what to do with the query types I marked with ? / "keep as is?" Can you also please double check the table and let me know if there's anything I've missed/messed up/you disagree with? Thanks!

count_links: This is used as part of the index update process, safe to ignore
degraded_full_text: This is a regular full text search that used some sort of invalid syntax, so it ran in a degraded mode. Should be included as a full text search
other_idx_lookup: This is part of the index update process, safe to ignore

GeoData_spatial_search: Runs from ApiQueryGeoSearchElastic. could go either way ... I supposeworth including but we should probably expect it to have a higher ZRR since it's a geo limited search. Probably a full text search
get: probably ignore, These would occur during the weekly dumps, and also can be triggered by the more like query, but the more like is the query that matters these are just extra things that happen. Can also be triggered with action=cirrusdump
namespace: These happen as part of another search, they shouldn't be considered independently. Any request that contains a namespace query should contain other queries as well, and those other queries are the ones we care about.

But beyond all the above, i would probably limit it to just the following:

full_text -> full text
degraded_full_text -> full text
regex -> full text
more_like -> full text
prefix -> prefix
comp_suggest -> prefix
GeoData_spatial_search -> full text

Change 283222 had a related patch set uploaded (by Bearloga):
Update ZRR computation

https://gerrit.wikimedia.org/r/283222

Change 283225 had a related patch set uploaded (by Bearloga):
Update ZRR presentation

https://gerrit.wikimedia.org/r/283225

Change 283222 merged by Bearloga:
Update ZRR computation

https://gerrit.wikimedia.org/r/283222

Change 283225 merged by Bearloga:
Update ZRR presentation

https://gerrit.wikimedia.org/r/283225

Change 283346 had a related patch set uploaded (by Bearloga):
Deploy updated dashboards

https://gerrit.wikimedia.org/r/283346

Change 283346 merged by Bearloga:
Deploy updated dashboards

https://gerrit.wikimedia.org/r/283346

Search metrics dashboard has been updated to be compatible with new data format. The zero results rate datasets are currently being backfilled (so we have per-query-type ZRR as of 2016-02-01) at a rate of about 30 minutes per day of data. As of right now it's backfilling 2016-02-10. It'll take about 30 hours until we have ZRR for the rest of February, all of March, and the first week and a half of April.

But once we have ZRR backfilled, http://discovery.wmflabs.org/metrics/#failure_breakdown will look AWESOME

I'm assuming that this fix has also reduced the zero results rate on the KPI tab? It was higher a few days ago, before the quarterly review. I'm testing my understanding here; I want to make sure I've got the facts straight before I email people about it. :-)