It could be interesting to learn more about the usage of the various search keywords.
I think this can easily be done by storing the state of SearchContext::$syntaxUsed in the CSRS payload.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Report all syntax in stats, also add syntax to the log | mediawiki/extensions/CirrusSearch | master | +1 K -748 |
Event Timeline
Maybe you could also use the webrequest table in Hadoop. Either via hiveql or spark? And parse the search request's urls for modifiers.
@mforns yes I think we could, the problem is that we'll have to "re-parse" the search query to extract special keywords, I think we have an UDF that extracts some of the features but it's no exhaustive. I think it'd be easier to use the state we have in mediawiki and pass it to CirrusSearchSearchRequestSet, In this table we have a generic payload attribute (map<string,string>) that we use sometimes, this way we won't have to re-implement some part of the parsing logic in hive.
Change 325821 had a related patch set uploaded (by Smalyshev):
Report all syntax in stats, also add syntax to the log
Change 325821 merged by jenkins-bot:
Report all syntax in stats, also add syntax to the log