Page MenuHomePhabricator

Kafka consumer to take learn to rank queries from a queue and run them against elasticsearch to generate relevance labels.
Closed, ResolvedPublic

Description

Open questions:

  • Do we really want to use relforge servers for this? It seems we could instead point the script at the hot spare elasticsearch cluster. Initially we should probably use relforge but long term may want to consider using the hot spare cluster to have the most up to date information.
  • How does the data go back into kafka? log4j handler, or should consumer parse the elasticsearch response and produce to kafka directly?
    • If using log4j that makes using a prod server a little more difficult, as changes to the plugin or log4j settings requires a full cluster restart.

Deliverable:

  • Consumer reads elasticsearch queries from kafka (analytics cluster) and sends them to elasticsearch
  • Results of queries are produced back into a different kafka log. Ideally these should be parsed down to a minimal representation of query+result page+detected feature values.
  • This should be generic enough that when we test changes to the LTR pipeline the changes are applied in the analytics cluster and the code running in production stays the same.