Page MenuHomePhabricator

Validate click events in TestSearchSatisfaction2
Closed, ResolvedPublic2 Estimated Story Points

Assigned To
Authored By
mpopov
Apr 14 2016, 5:30 PM
Referenced Files
F3943115: daily_ctr_2_mini.png
Apr 28 2016, 6:14 PM
F3942989: daily_ctr_mini.png
Apr 28 2016, 5:42 PM
Unknown Object (File)
Apr 28 2016, 12:02 AM

Description

Erik managed to make JS send click events that fire between the user clicking a search result and the browser taking them to the page they clicked on. We need to see if the click events allow us to compute the same clickthrough rate as visitPage events do, and how much (if any) browser bias there is.

This validation is required before we try hooking up the TextCat A/B/C test (T121542) to the front-end.

Event Timeline

mpopov set the point value for this task to 2.Apr 14 2016, 5:31 PM
mpopov triaged this task as High priority.Apr 14 2016, 5:45 PM

Started putting together a detailed report at https://github.com/wikimedia-research/Discovery-Search-Adhoc-ClickEventValidation Going to include some of the results here as a comment.

Terminology:

  • "session-wise" looks at the whole session and checks if there are ANY valid clicks / valid page visits
  • "SERP-wise" uses the page view ID that click events and searchResultPage events share to link them and calculate how many of the SERPs had a click associated with them
  • "valid click" refers to a click event that has a recorded result position, which allows us to focus on SERP clicks
  • "valid visit" refers to a visitPage event that has a recorded result position, which allows us to filter out erroneous page visit events)

daily_ctr_mini.png (600×1 px, 111 KB)

Sessions from TestSearchSatisfaction2Proportion of sessions
sessions with valid clicks only89.235%
sessions with valid visits only16.849%
sessions with valid clicks AND visits14.327%
sessions with more valid clicks than valid visits78.398%
sessions with more valid visits than valid clicks3.371%
sessions with valid clicks AND visits, AND clicks match visits 100%8.971%
sessions with valid clicks AND visits, AND clicks don't match visits at all0.756%
sessions with clicks but not valid clicks8.883%
sessions with visits but not valid visits0.000%
sessions with valid clicks that couldn't be matched with valid visits4.668%
sessions with valid visits that couldn't be matched with valid clicks2.752%

After filtering down to only sessions where the clicks matched the visits 100% and sessions that had 0 clicks/visits, it looks like 52.12% might be our most-valid overall clickthrough rate, with the following daily breakdown:

DateTotal valid sessionsAbandoned sessionsClickthrough'd sessionsClickthrough rate (%)
2016-04-02163175287953.893
2016-04-031946909103753.289
2016-04-0425751255132051.262
2016-04-0525861297128949.845
2016-04-0625881276131250.696
2016-04-0724551187126851.650
2016-04-0823221118120451.852
2016-04-09174084189951.667
2016-04-102047976107152.320
2016-04-1127981350144851.751
2016-04-1226621267139552.404
2016-04-1326701293137751.573
2016-04-1425551222133352.172
2016-04-1521891026116353.129
2016-04-16166077488653.373
2016-04-17192695097650.675
2016-04-1825701201136953.268
2016-04-1925441236130851.415
2016-04-2025331200133352.625
2016-04-2124121158125451.990
2016-04-2221841014117053.571
2016-04-23169276592754.787
2016-04-241982960102251.564

How does this method compare with the other estimation methods above? Remarkably close to just looking at "CTR as % of sessions that had at least one click event":

daily_ctr_2_mini.png (600×1 px, 79 KB)

Done with 1st draft. Sent to Trey for review.

Sent 2nd draft to Trey for review.

debt subscribed.

Looks like this is resolved - closing.