|operations/mediawiki-config : master||Remove Monolog Kafka handler and configuration|
|operations/mediawiki-config : master||Use empty array for default when removing all wmgMonologAvroSchemas|
|operations/mediawiki-config : master||Disable CirrusSearchRequestSet avro monolog channel|
|wikimedia/discovery/analytics : master||Remove cirrus request log dependency from query_clicks_daily|
|mediawiki/extensions/CirrusSearch : master||Validate request logging with json schema|
|analytics/refinery/source : master||Update GetMainSearchRequest udf for schema change|
|Open||Ottomata||T185233 Modern Event Platform (TEC2)|
|Open||Ottomata||T201068 Modern Event Platform: Stream Intake Service|
|Resolved||Ottomata||T206785 Modern Event Platform: Stream Intake Service (EventGate): Implementation|
|Resolved||Ottomata||T214080 Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate|
|Resolved||EBernhardson||T222268 Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request|
The primary user of this data is the oozie job in wikimedia/discovery/analytics repository which proccesses these logs into the discovery.query_clicks_hourly table. This is a single hql script, so hopefully should be easy to port. There are also a variety of notebooks used for ad-hoc analysis that might also have to change, but I don't think any of those are under source control.
There are two scripts, one for hourly and daily:
but they are mostly the same.
The query is a bit complicated, it's probably safest if someone from search/discovery picks up this task. BTW if all goes well we should have all cirrussearch-request events in Hive starting today.