Page MenuHomePhabricator

Build oozie job to collect click data for learn to rank
Closed, ResolvedPublic

Description

Build out an oozie job to join together the searches in the CirrusSearchRequestSet table with the click through data in webrequest

Table should have one row per search which includes:

  • Original search term
  • Wiki where search occured
  • List of all pages seen by user with original position order
  • List of all pages clicked by user, preferably with timestamps or ordering of clicks
  • Number of searches performed by the ip per day, for later filtering

Data must be purged after 90 days to comply with privacy policy requirements.

Event Timeline

Change 327855 had a related patch set uploaded (by EBernhardson):
[analytics/refinery/source@master] UDF for extracting primary full text search request

https://gerrit.wikimedia.org/r/327855

Change 317019 had a related patch set uploaded (by EBernhardson):
[wikimedia/discovery/analytics@master] Calculate click data for top queries

https://gerrit.wikimedia.org/r/317019

This has already been mostly worked out as part of my initial evaluation, it still needs some review to be merged and deployed.

debt subscribed.

This has been running in production for a while and looks good - more work to be completed in T162053

Change 327855 merged by Nuria:
[analytics/refinery/source@master] UDF for extracting primary full text search request

https://gerrit.wikimedia.org/r/327855

Change 317019 merged by EBernhardson:
[wikimedia/discovery/analytics@master] Calculate click data for top queries

https://gerrit.wikimedia.org/r/317019