Kafka consumer to take learn to rank queries from a queue and run them against elasticsearch to generate relevance labels.
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	EBernhardson
	Apr 3 2017, 3:55 PM

Description

Open questions:

Do we really want to use relforge servers for this? It seems we could instead point the script at the hot spare elasticsearch cluster. Initially we should probably use relforge but long term may want to consider using the hot spare cluster to have the most up to date information.
How does the data go back into kafka? log4j handler, or should consumer parse the elasticsearch response and produce to kafka directly?
- If using log4j that makes using a prod server a little more difficult, as changes to the plugin or log4j settings requires a full cluster restart.

Deliverable:

Consumer reads elasticsearch queries from kafka (analytics cluster) and sends them to elasticsearch
Results of queries are produced back into a different kafka log. Ideally these should be parsed down to a minimal representation of query+result page+detected feature values.
This should be generic enough that when we test changes to the LTR pipeline the changes are applied in the analytics cluster and the code running in production stays the same.

	Subject	Repo	Branch	Lines +/-
	Add feature collection over kafka	search/MjoLniR	master	+550 -25

Status	Assigned	Task
Invalid	None	T174064 [FY 2017-18 Objective] Implement advanced search methodologies
Resolved	EBernhardson	T161632 [Epic] Improve search by researching and deploying machine learning to re-rank search results
Resolved	EBernhardson	T162053 backend data engineering and plumbing for LTRank
Resolved	EBernhardson	T162059 Kafka consumer to take learn to rank queries from a queue and run them against elasticsearch to generate relevance labels.

Change 361010 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Add feature collection over kafka

Change 361010 merged by DCausse:
[search/MjoLniR@master] Add feature collection over kafka

debt closed this task as Resolved.Jul 7 2017, 9:05 PM