Page MenuHomePhabricator

Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request
Closed, ResolvedPublic

Event Timeline

The primary user of this data is the oozie job in wikimedia/discovery/analytics repository which proccesses these logs into the discovery.query_clicks_hourly table. This is a single hql script, so hopefully should be easy to port. There are also a variety of notebooks used for ad-hoc analysis that might also have to change, but I don't think any of those are under source control.

There are two scripts, one for hourly and daily:
https://github.com/wikimedia/wikimedia-discovery-analytics/tree/master/oozie/query_clicks

but they are mostly the same.

The query is a bit complicated, it's probably safest if someone from search/discovery picks up this task. BTW if all goes well we should have all cirrussearch-request events in Hive starting today.

Ottomata triaged this task as High priority.May 2 2019, 4:51 PM
Ottomata raised the priority of this task from High to Needs Triage.
Ottomata triaged this task as High priority.
Ottomata moved this task from Incoming to Event Platform on the Analytics board.

Heya Erik! I know you said you would try to get to this after the Hackathon this coming weekend. I'm assigning to you so we don't forget.

Thank youUuUUUu! :)

@EBernhardson o/ I'm giving the API folks until June 10th before I turn off the ApiAction avro stream, mainly since there is no movement from them. We can wait a little longer if you need more time here, but we really want to decommssion the old Kafka cluster this quarter. How goes?

Change 513313 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[analytics/refinery/source@master] Update GetMainSearchRequest udf for schema change

https://gerrit.wikimedia.org/r/513313

Change 513313 merged by jenkins-bot:
[analytics/refinery/source@master] Update GetMainSearchRequest udf for schema change

https://gerrit.wikimedia.org/r/513313

Change 513371 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Validate request logging with json schema

https://gerrit.wikimedia.org/r/513371

Change 513371 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Validate request logging with json schema

https://gerrit.wikimedia.org/r/513371

Change 514173 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[wikimedia/discovery/analytics@master] Port query_clicks_hourly to eventgate

https://gerrit.wikimedia.org/r/514173

Change 514173 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] Remove cirrus request log dependency from query_clicks_daily

https://gerrit.wikimedia.org/r/514173

Hio! Today I disabled the other Avro ApiAction stream. I'd love to disable CirrusSearchRequestSet too so we can begin decommissioning the old analytics Kafka cluster. Status? :D

Change 517871 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel

https://gerrit.wikimedia.org/r/517871

Change 517874 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Remove Monolog Kafka handler and configuration

https://gerrit.wikimedia.org/r/517874

Hio! Today I disabled the other Avro ApiAction stream. I'd love to disable CirrusSearchRequestSet too so we can begin decommissioning the old analytics Kafka cluster. Status? :D

The jobs are now deployed against the eventgate schema, we should be good to turn off the avro imports.

Change 518784 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel

https://gerrit.wikimedia.org/r/518784

Change 518784 merged by Ottomata:
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel

https://gerrit.wikimedia.org/r/518784

Change 518786 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Use empty array for default when removing all wmgMonologAvroSchemas

https://gerrit.wikimedia.org/r/518786

Change 518786 merged by Ottomata:
[operations/mediawiki-config@master] Use empty array for default when removing all wmgMonologAvroSchemas

https://gerrit.wikimedia.org/r/518786

Mentioned in SAL (#wikimedia-operations) [2019-06-24T18:35:34Z] <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Disable CirrusSearchRequestSet avro monolog channel - T222268 (duration: 00m 55s)

Change 517874 abandoned by Ottomata:
Remove Monolog Kafka handler and configuration

Reason:
Easier to do in smaller patches...

https://gerrit.wikimedia.org/r/517874