Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ottomata | T185233 Modern Event Platform | |||
Resolved | Ottomata | T201068 Modern Event Platform: Stream Intake Service | |||
Resolved | Ottomata | T206785 Modern Event Platform: Stream Intake Service (EventGate): Implementation | |||
Resolved | Ottomata | T214080 Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate | |||
Resolved | EBernhardson | T222268 Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request |
Event Timeline
The primary user of this data is the oozie job in wikimedia/discovery/analytics repository which proccesses these logs into the discovery.query_clicks_hourly table. This is a single hql script, so hopefully should be easy to port. There are also a variety of notebooks used for ad-hoc analysis that might also have to change, but I don't think any of those are under source control.
There are two scripts, one for hourly and daily:
https://github.com/wikimedia/wikimedia-discovery-analytics/tree/master/oozie/query_clicks
but they are mostly the same.
The query is a bit complicated, it's probably safest if someone from search/discovery picks up this task. BTW if all goes well we should have all cirrussearch-request events in Hive starting today.
Heya Erik! I know you said you would try to get to this after the Hackathon this coming weekend. I'm assigning to you so we don't forget.
Thank youUuUUUu! :)
@EBernhardson o/ I'm giving the API folks until June 10th before I turn off the ApiAction avro stream, mainly since there is no movement from them. We can wait a little longer if you need more time here, but we really want to decommssion the old Kafka cluster this quarter. How goes?
Change 513313 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[analytics/refinery/source@master] Update GetMainSearchRequest udf for schema change
Change 513313 merged by jenkins-bot:
[analytics/refinery/source@master] Update GetMainSearchRequest udf for schema change
Change 513371 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Validate request logging with json schema
Change 513371 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Validate request logging with json schema
Change 514173 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[wikimedia/discovery/analytics@master] Port query_clicks_hourly to eventgate
Change 514173 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] Remove cirrus request log dependency from query_clicks_daily
Hio! Today I disabled the other Avro ApiAction stream. I'd love to disable CirrusSearchRequestSet too so we can begin decommissioning the old analytics Kafka cluster. Status? :D
Change 517871 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel
Change 517874 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Remove Monolog Kafka handler and configuration
The jobs are now deployed against the eventgate schema, we should be good to turn off the avro imports.
Change 518784 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel
Change 518784 merged by Ottomata:
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel
Change 518786 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Use empty array for default when removing all wmgMonologAvroSchemas
Change 518786 merged by Ottomata:
[operations/mediawiki-config@master] Use empty array for default when removing all wmgMonologAvroSchemas
Mentioned in SAL (#wikimedia-operations) [2019-06-24T18:35:34Z] <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Disable CirrusSearchRequestSet avro monolog channel - T222268 (duration: 00m 55s)
Change 517874 abandoned by Ottomata:
Remove Monolog Kafka handler and configuration
Reason:
Easier to do in smaller patches...