The zero results rate for full text search has increased recently, after a decrease that was sustained for a while (see screenshot below). We should investigate why this has happened.
|Resolved||mpopov||T132503 Update ZRR data collection to exclude irrelevant/invalid Cirrus requests|
|Resolved||mpopov||T131196 Research why the zero results rate for full text search is increasing|
Okay, it appears that (for whatever reason), more_like requests are messing with ZRR for full-text pretty hard. See difference between (b) and (d):
Note: "fixed" in (b) refers to me excluding irrelevant query_types (e.g. "send_data_write", "send_deletes") because those are currently being included in the calculation of Prefix zero results rate due to the lack of documentation about the dataset & lack of communication between engineers & analysts. We will need to meet at some point to discuss this and make sure the dashboards are collecting the appropriate data.
Now, I don't know all the ins and outs of the behind-the-scenes stuff but this is really concerning to me:
dcausse: bearloga: trying to digest your data: one of your conclusions, modelike query generate a lot of zero result? dcausse: s/modelike/morelike/ dcausse: ebernhardson: I wonder if the morelike discrepencies on the bearloga data is not caused by the data we store when the morelike query is cached? ebernhardson: dcausse: looking ebernhardson: bearloga: does that say more like ZRR is 75%? (bearloga) ebernhardson: yup ebernhardson: bearloga: well, at least i know that's almost certainly wrong :) looking closer... ebernhardson: bearloga: so, for now i think our best bet would just be to ignore all more_like queries that have payload['cached'] == true ebernhardson: bearloga: or perhaps would be better (would have to look closer), anything with hitstotal: -1? We use -1 in a variety of places to indicate something that can't be found ebernhardson: we should be able to fix this actual logging too though