Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request
Closed, ResolvedPublic
Actions

Details

Subject	Repo	Branch	Lines +/-
Remove Monolog Kafka handler and configuration	operations/mediawiki-config	master	+5 -69
Use empty array for default when removing all wmgMonologAvroSchemas	operations/mediawiki-config	master	+3 -1
Disable CirrusSearchRequestSet avro monolog channel	operations/mediawiki-config	master	+2 -14
Remove cirrus request log dependency from query_clicks_daily	wikimedia/discovery/analytics	master	+14 -52
Validate request logging with json schema	mediawiki/extensions/CirrusSearch	master	+33 -8
Update GetMainSearchRequest udf for schema change	analytics/refinery/source	master	+10 -7

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	Ottomata	T185233 Modern Event Platform
Resolved	Ottomata	T201068 Modern Event Platform: Stream Intake Service
Resolved	Ottomata	T206785 Modern Event Platform: Stream Intake Service (EventGate): Implementation
Resolved	Ottomata	T214080 Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate
Resolved	EBernhardson	T222268 Port usage of mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request

Event Timeline

Ottomata created this task.May 1 2019, 2:52 PM

Ottomata added projects: Discovery-ARCHIVED, Discovery-Analysis, Discovery-Search.

Restricted Application added a project: Product-Analytics. · View Herald TranscriptMay 1 2019, 2:52 PM

Ottomata edited parent tasks, added: T214080: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate; removed: T222267: Port usage of mediawiki_ApiAction to mediawiki_api_request.May 1 2019, 2:53 PM

The primary user of this data is the oozie job in wikimedia/discovery/analytics repository which proccesses these logs into the discovery.query_clicks_hourly table. This is a single hql script, so hopefully should be easy to port. There are also a variety of notebooks used for ad-hoc analysis that might also have to change, but I don't think any of those are under source control.

There are two scripts, one for hourly and daily:
https://github.com/wikimedia/wikimedia-discovery-analytics/tree/master/oozie/query_clicks

but they are mostly the same.

The query is a bit complicated, it's probably safest if someone from search/discovery picks up this task. BTW if all goes well we should have all cirrussearch-request events in Hive starting today.

Ottomata removed Ottomata as the assignee of this task.May 2 2019, 3:34 PM

Ottomata triaged this task as High priority.May 2 2019, 4:51 PM

Ottomata raised the priority of this task from High to Needs Triage.

Ottomata triaged this task as High priority.

Ottomata moved this task from Incoming to Event Platform on the Analytics board.

debt moved this task from needs triage to ML & Data Pipeline on the Discovery-Search board.May 2 2019, 5:13 PM

Heya Erik! I know you said you would try to get to this after the Hackathon this coming weekend. I'm assigning to you so we don't forget.

Thank youUuUUUu! :)

Ottomata moved this task from Backlog to In Progress Before Value Streams Kickoff (August 15th) on the Event-Platform board.May 14 2019, 1:33 PM

@EBernhardson o/ I'm giving the API folks until June 10th before I turn off the ApiAction avro stream, mainly since there is no movement from them. We can wait a little longer if you need more time here, but we really want to decommssion the old Kafka cluster this quarter. How goes?

Change 513313 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[analytics/refinery/source@master] Update GetMainSearchRequest udf for schema change

https://gerrit.wikimedia.org/r/513313

gerritbot added a project: Patch-For-Review.May 30 2019, 4:33 PM

Change 513313 merged by jenkins-bot:
[analytics/refinery/source@master] Update GetMainSearchRequest udf for schema change

https://gerrit.wikimedia.org/r/513313

Change 513371 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Validate request logging with json schema

https://gerrit.wikimedia.org/r/513371

Change 513371 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Validate request logging with json schema

https://gerrit.wikimedia.org/r/513371

ReleaseTaggerBot added a project: MW-1.34-notes (1.34.0-wmf.8; 2019-06-04).May 31 2019, 7:00 PM

EBernhardson moved this task from ML & Data Pipeline to Current work on the Discovery-Search board.Jun 3 2019, 5:02 PM

EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search.

Change 514173 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[wikimedia/discovery/analytics@master] Port query_clicks_hourly to eventgate

https://gerrit.wikimedia.org/r/514173

Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.Jun 6 2019, 5:04 PM

Change 514173 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] Remove cirrus request log dependency from query_clicks_daily

https://gerrit.wikimedia.org/r/514173

@EBernhardson how goes?

Hio! Today I disabled the other Avro ApiAction stream. I'd love to disable CirrusSearchRequestSet too so we can begin decommissioning the old analytics Kafka cluster. Status? :D

Change 517871 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel

https://gerrit.wikimedia.org/r/517871

Change 517874 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Remove Monolog Kafka handler and configuration

https://gerrit.wikimedia.org/r/517874

In T222268#5268607, @Ottomata wrote:

Hio! Today I disabled the other Avro ApiAction stream. I'd love to disable CirrusSearchRequestSet too so we can begin decommissioning the old analytics Kafka cluster. Status? :D

The jobs are now deployed against the eventgate schema, we should be good to turn off the avro imports.

EBernhardson moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.Jun 20 2019, 4:59 PM

debt closed this task as Resolved.Jun 21 2019, 2:16 PM

Change 518784 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel

https://gerrit.wikimedia.org/r/518784

Change 518784 merged by Ottomata:
[operations/mediawiki-config@master] Disable CirrusSearchRequestSet avro monolog channel

https://gerrit.wikimedia.org/r/518784

Change 518786 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/mediawiki-config@master] Use empty array for default when removing all wmgMonologAvroSchemas

https://gerrit.wikimedia.org/r/518786

Change 518786 merged by Ottomata:
[operations/mediawiki-config@master] Use empty array for default when removing all wmgMonologAvroSchemas

https://gerrit.wikimedia.org/r/518786

Mentioned in SAL (#wikimedia-operations) [2019-06-24T18:35:34Z] <otto@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Disable CirrusSearchRequestSet avro monolog channel - T222268 (duration: 00m 55s)