Page MenuHomePhabricator

Portal Event logging: Figure out why Schema:WikipediaPortal has duplicate events & fix it
Closed, InvalidPublic

Description

@JGirault noticed a discrepancy between the clickthrough rate on the Portal dashboard and the rate reported in the most recent Portal A/B test analysis.

The only difference is that the data used in the A/B test report underwent an additional cleaning step wherein duplicated events were removed. Therefore the clickthrough rate that is surfaced on the Portal dashboard is calculated using faulty data that has A LOT of duplicate events (any session should have at most 1 landing event and 1 clickthrough event).

An example of faulty data:

Session HashType of EventSection Used (if any)TimestampUser Agent
003e072635bb8367landingno action2016-04-18 18:03:18"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367clickthroughsearch2016-04-18 18:03:24"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:10:10"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:14:29"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367clickthroughsearch2016-04-18 18:14:32"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:22:50"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367clickthroughsearch2016-04-18 18:22:56"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:29:24"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:41:27"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:42:48"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367clickthroughsearch2016-04-18 18:42:51"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:53:12"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367clickthroughsearch2016-04-18 18:53:15"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367landingno action2016-04-18 18:57:49"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"
003e072635bb8367clickthroughsearch2016-04-18 18:57:56"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"

Here are the top 10 sessions from 2016-04-25 by number of events per session:

USE log;
SELECT session, type, COUNT(1) AS n_events FROM (
  SELECT event_session_id AS session, event_event_type AS type
  FROM WikipediaPortal_14377354
  WHERE LEFT(timestamp, 8) = '20160425' AND ((event_cohort IS NULL) OR (event_cohort IN ('null','baseline')))
) AS events
GROUP BY session, type
ORDER BY COUNT(1) DESC
LIMIT 10;
sessiontypen_events
bffdb37f71390448landing54
bffdb37f71390448clickthrough45
af4f8d39d0938791clickthrough29
44cec5cdd7158153landing28
8b94560060a388efclickthrough28
02d93779e11e09b7landing21
22d37669555e82belanding19
8b94560060a388eflanding17
48f3fc8f0f5c8bb0landing17
4558e753c79b0ce4clickthrough17

That is not good and should be corrected in the nearest future.

Event Timeline

debt renamed this task from Figure out why Schema:WikipediaPortal has duplicate events & fix the EL to Portal Dashboard: Figure out why Schema:WikipediaPortal has duplicate events & fix the EL.Apr 26 2016, 7:40 PM
debt triaged this task as High priority.
debt added a project: Discovery-ARCHIVED.
debt updated the task description. (Show Details)
debt added a subscriber: Jdrewniak.
debt renamed this task from Portal Dashboard: Figure out why Schema:WikipediaPortal has duplicate events & fix the EL to Portal Event logging: Figure out why Schema:WikipediaPortal has duplicate events & fix it.Apr 26 2016, 7:44 PM
debt edited projects, added Discovery-Analysis; removed Discovery-Portal-Backlog.
debt updated the task description. (Show Details)
debt added a subscriber: JGirault.
debt subscribed.

Closing this as invalid...closing this in favor of this ticket: https://phabricator.wikimedia.org/T134199