Page MenuHomePhabricator

Add a way to classify queries
Closed, ResolvedPublic

Description

In order to dispatch queries to a particular search setup (cirrus defaults vs wikibase custom query builder) we need a flexible way to classify queries.
Not all search setup may support the variety available in the search syntax.
Introduce a classification mechanism where extension could register their classifiers.
The query classes will be lazy loaded while trying to dispatch the query to a particular query builder.
By default the repository of classifiers will include :

  • simple_bag_of_words: e.g. foo bar
  • simple_phrase: e.g. "foo bar"
  • bag of words with phrase: e.g. foo "bar baz"
  • expert query: e.g. foo OR bar or anything that we consider being an expert usage (i.e. search keywords)
  • bogus query: e.g. foo AND bar NOT

Note that a query may belong to multiple classes.

Details

Related Gerrit Patches:
mediawiki/extensions/CirrusSearch : masterAdd ParsedQuery::getFeaturesUsed()
mediawiki/extensions/CirrusSearch : masterAdd a fulltext query classifier

Event Timeline

dcausse triaged this task as Medium priority.Jun 20 2018, 10:06 AM
dcausse created this task.
Restricted Application edited projects, added Discovery-Search; removed Discovery-Search (Current work). · View Herald TranscriptJun 20 2018, 10:06 AM
dcausse updated the task description. (Show Details)
dcausse updated the task description. (Show Details)

Is it only for fulltext queries or any queries? How this classification relates to syntax classification we already have in the code?

Also, with advanced search UI, I am not sure using keywords is really for "experts" anymore - we might see quite more of it, especially with MCR/SDC.

This classification is meant to replace the 'getSyntaxUsed' approach we have in SearchContext. It'll be for fulltext queries but there are no reasons it can't be used for other type of search even though we do not have any particular syntax in prefix/completion searches yet.
As for the 'experts' class, indeed as seen by cirrus we may receive a complex query that is not made by an expert user. But in the end the purpose is not to classify users but classify queries so that a particular query builder can decide whether or not it's able to generate a elastic query for it.
I may rename 'expert' to 'complex query' to avoid confusion.

Change 441406 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Add a fulltext query classifier

https://gerrit.wikimedia.org/r/441406

Change 441416 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Add ParsedQuery::getFeaturesUsed()

https://gerrit.wikimedia.org/r/441416

Change 441406 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add a fulltext query classifier

https://gerrit.wikimedia.org/r/441406

Change 441416 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add ParsedQuery::getFeaturesUsed()

https://gerrit.wikimedia.org/r/441416

Vvjjkkii renamed this task from Add a way to classify queries to llaaaaaaaa.Jul 1 2018, 1:03 AM
Vvjjkkii removed dcausse as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from llaaaaaaaa to Add a way to classify queries.Jul 2 2018, 11:38 AM
CommunityTechBot assigned this task to dcausse.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.
debt closed this task as Resolved.Jul 9 2018, 11:49 PM
debt added a subscriber: debt.

Closing this, as it'll go out with this week's train