Page MenuHomePhabricator

Store SearchContext::$syntaxUsed in the CirrusSearchRequestSet payload
Closed, ResolvedPublic

Description

It could be interesting to learn more about the usage of the various search keywords.
I think this can easily be done by storing the state of SearchContext::$syntaxUsed in the CSRS payload.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Maybe you could also use the webrequest table in Hadoop. Either via hiveql or spark? And parse the search request's urls for modifiers.

debt triaged this task as Medium priority.Sep 30 2016, 7:38 PM
debt moved this task from needs triage to This Quarter on the Discovery-Search board.
debt added a subscriber: EBernhardson.

@mforns yes I think we could, the problem is that we'll have to "re-parse" the search query to extract special keywords, I think we have an UDF that extracts some of the features but it's no exhaustive. I think it'd be easier to use the state we have in mediawiki and pass it to CirrusSearchSearchRequestSet, In this table we have a generic payload attribute (map<string,string>) that we use sometimes, this way we won't have to re-implement some part of the parsing logic in hive.

Change 325821 had a related patch set uploaded (by Smalyshev):
Report all syntax in stats, also add syntax to the log

https://gerrit.wikimedia.org/r/325821

Change 325821 merged by jenkins-bot:
Report all syntax in stats, also add syntax to the log

https://gerrit.wikimedia.org/r/325821