Page MenuHomePhabricator

Research why the zero results rate for full text search is increasing
Closed, ResolvedPublic4 Story Points


The zero results rate for full text search has increased recently, after a decrease that was sustained for a while (see screenshot below). We should investigate why this has happened.

Event Timeline

Deskana created this task.Mar 29 2016, 9:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 29 2016, 9:10 PM
Deskana triaged this task as Normal priority.Mar 29 2016, 9:10 PM

Normal priority, but it's unlikely we'll get to this fast.

Deskana moved this task from Needs triage to Maps on the Discovery board.Mar 29 2016, 9:11 PM
Deskana moved this task from Maps to Analysis on the Discovery board.
Deskana raised the priority of this task from Normal to High.Apr 5 2016, 7:42 PM
Deskana added a subscriber: mpopov.

@mpopov I've raised priority on this and thrown it into the sprint.

Deskana moved this task from Analysis to On Sprint Board on the Discovery board.Apr 7 2016, 8:10 PM
mpopov claimed this task.Apr 11 2016, 4:33 PM
mpopov set the point value for this task to 4.
mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.

Okay, it appears that (for whatever reason), more_like requests are messing with ZRR for full-text pretty hard. See difference between (b) and (d):

Note: "fixed" in (b) refers to me excluding irrelevant query_types (e.g. "send_data_write", "send_deletes") because those are currently being included in the calculation of Prefix zero results rate due to the lack of documentation about the dataset & lack of communication between engineers & analysts. We will need to meet at some point to discuss this and make sure the dashboards are collecting the appropriate data.

Now, I don't know all the ins and outs of the behind-the-scenes stuff but this is really concerning to me:

@dcausse & @EBernhardson: Do you have any insights/comments on what we're seeing here?

dcausse: bearloga: trying to digest your data: one of your conclusions, modelike query generate a lot of zero result?
dcausse: s/modelike/morelike/
dcausse: ebernhardson: I wonder if the morelike discrepencies on the bearloga data is not caused by the data we store when the morelike query is cached?
ebernhardson: dcausse: looking
ebernhardson: bearloga: does that say more like ZRR is 75%?
(bearloga) ebernhardson: yup
ebernhardson: bearloga: well, at least i know that's almost certainly wrong :) looking closer...
ebernhardson: bearloga: so, for now i think our best bet would just be to ignore all more_like queries that have payload['cached'] == true
ebernhardson: bearloga: or perhaps would be better (would have to look closer), anything with hitstotal: -1?  We use -1 in a variety of places to indicate something that can't be found
ebernhardson: we should be able to fix this actual logging too though
Deskana closed this task as Resolved.Apr 12 2016, 8:51 PM
Deskana moved this task from In progress to Done on the Discovery-Analysis (Current work) board.

Investigation complete. Several possible issues and resolutions were identified. I'm resolving this task, and will create separate subtasks for each issue.