Page MenuHomePhabricator

Extract a set of full_text "ambiguous" queries from hive
Closed, ResolvedPublic

Description

We need to extract a set of ambiguous queries (that return more than 1000 results for enwiki)
Ideally we need :

  • a set with basic queries (no special syntax, no phrase search)
  • a set with single word queries
  • a set with multi word queries

We should carefully exclude queries from the WikipediaApp since they include partial words (search as you type) which will pollute the set.

Event Timeline

dcausse raised the priority of this task from to Medium.
dcausse updated the task description. (Show Details)
dcausse added subscribers: Aklapper, StudiesWorld, dcausse.

Change 268704 had a related patch set uploaded (by DCausse):
hive query to extract sample query set

https://gerrit.wikimedia.org/r/268704

Queries are available on stat1002.eqiad.wmnet:~dcausse/query_sets/

Change 268704 merged by jenkins-bot:
hive query to extract sample query set

https://gerrit.wikimedia.org/r/268704