Page MenuHomePhabricator

Summarize what we know about the "zero results" queries
Closed, DuplicatePublic

Description

We'll be brainstorming how to reduce the "zero results" rate. We're probably going to think of lots of really good, interesting ideas, but we're shooting in the dark unless we have some insight into the queries that are currently yielding no results.

For this task, let's analyze the current data (i.e. what we used to come up with the 25% figure), and see what we can learn about queries that are failing to come up with results.

Stakeholder: The users who are currently failing to get search results
Benefit: Frame the team's planning for the next quarter
Estimate: Needs to be done before the brainstorming meeting

Event Timeline

Jdouglas raised the priority of this task from to Needs Triage.
Jdouglas updated the task description. (Show Details)
Jdouglas added a project: CirrusSearch.
Jdouglas subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The zero-results logs have lots of prefix searches -- are these from typeahead searches run automatically from the search input box, or did folks search for these literal phrase fragments?

The zero-results logs have lots of prefix searches -- are these from typeahead searches run automatically from the search input box, or did folks search for these literal phrase fragments?

Prefix search is almost entirely typeahead. They come from firefox plugins and the site directly.

Did @Ironholds say something about analyzing these separately from the rest, and that both still came up with a 25% no-results rate?

Yeah, but it was a very ad-hoc job; we need to set up more structured and robust reporting before I'd rely on that number.

Deskana subscribed.

This would ordinarily be an Analysis task, but since Oliver's out, an engineer can do this one because we need it really soon!

I don't think anyone except oliver currently has access to this data in a reasonable format

I'd like to have access to this data and start to evaluate performances of the cybozu language detector on it (T104505).

Has the data been collected? (asked the newbie who didn't know.)

It would be really wonderful to have (appropriately anonymized, reasonably formatted) data sitting in a pile somewhere that everyone has access to. David and I were talking today, and it'd be great to do a quick-n-dirty name-detection pass over the query strings to see how many are likely names, for example, to gauge what kind of impact improved name indexing and searching might have on zero results.

It hasn't, no - or, we have a way of collecting it but we're not consistently using it. If that's a desired thing, poke Dan; people have been asking for this a lot but it's not highly prioritised

I think this is a duplicate of T107035, which is where this analysis is actually taking place.

@Deskana, I created T107035 as a subtask of this one, since my analysis isn't necessarily the only analysis. I'm focusing mostly on the wikipedia queries as a place to get started, and I wasn't sure what additional analysis this task would cover.

@Deskana, I created T107035 as a subtask of this one, since my analysis isn't necessarily the only analysis. I'm focusing mostly on the wikipedia queries as a place to get started, and I wasn't sure what additional analysis this task would cover.

Not a problem. You're doing exactly the right thing. I think we can just merge this task into your task.