Page MenuHomePhabricator

Investigate if Interactive logging schema makes sense
Closed, ResolvedPublic4 Story Points

Description

As an Analysis team, we should take a look at the new schema that was recently launched for maps T149140 to be sure that it makes sense from data collection point of view.

In T149140 there is a follow-on ticket T149834 to implement some (or all) of the remaining events that weren't already pushed into production.

Event Timeline

debt created this task.Nov 2 2016, 7:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 2 2016, 7:44 PM
debt triaged this task as Normal priority.Nov 2 2016, 7:44 PM
mpopov claimed this task.Nov 7 2016, 11:11 PM
mpopov moved this task from Backlog to In progress on the Discovery-Analysis (Current work) board.
mpopov set the point value for this task to 4.

@JGirault Am I interpreting the extra field correctly? It looks like {"route":"/map[frame]/XYZ"} implies (to me) that XYZ is the zoom level? But then the numbers I'm seeing are...weird. What is route? Could you please add an explanation to https://meta.wikimedia.org/wiki/Schema:Kartographer?

SELECT
  event_feature AS feature,
  event_action AS action,
  CAST(REGEXP_SUBSTR(REGEXP_SUBSTR(event_extra, '"/map.*/([0-9]+)"'), "[0-9]+") AS UNSIGNED INTEGER) AS `zoom level???`,
  COUNT(*) AS events
FROM Kartographer_16010805
GROUP BY feature, action, `zoom level???`
ORDER BY feature, action ASC, `zoom level???` ASC
LIMIT 1000;
featureactionzoom level???events
mapframeclose017
mapframehashopen03
mapframeopen022
mapframeopen11
mapframesidebar-hide11
mapframesidebar-show11
mapframesidebar-type12
mapframeview016
maplinkclose07
............
maplinkclose71
maplinkclose81
maplinkclose181
maplinkclose241
maplinkhashopen02
maplinkopen08
maplinkopen11
maplinkopen21
............
maplinkopen481
maplinksidebar-hide11
maplinksidebar-hide71
maplinksidebar-show11
............
maplinksidebar-show481
maplinkview070
maplinkview114
............
maplinkview1012
maplinkview112
maplinkview125
............
maplinkview326
............
maplinkview3041

In future revision, mobile field is unnecessary because event logging automatically inserts webHost information, so whether the event was made on a mobile site can be determined by checking for presence of ".m." in webHost:

SELECT
  INSTR(webHost, '.m.') > 0 AS is_mobile,
  event_mobile,
  COUNT(*) AS events
FROM Kartographer_16010805
GROUP BY is_mobile, event_mobile;
is_mobileevent_mobileevents
00306
11177

Close events seem to be working well! :)

SELECT
  STR_TO_DATE(`date`, '%Y%m%d') AS `date`,
  TIME_FORMAT(SEC_TO_TIME(MIN(min_sec)),'%Hh %im %ss') AS `shortest duration`,
  TIME_FORMAT(SEC_TO_TIME(AVG(avg_sec)),'%Hh %im %ss') AS `average per-user duration`,
  TIME_FORMAT(SEC_TO_TIME(MAX(max_sec)),'%Hh %im %ss') AS `longest duration`,
  COUNT(*) AS `unique user tokens`,
  SUM(close_events) AS `total close events`
FROM (
  SELECT
   LEFT(timestamp, 8) AS `date`,
  	event_userToken as user_id,
  	MIN(event_duration/1000) AS min_sec,
  	MAX(event_duration/1000) AS max_sec,
  	AVG(event_duration/1000) AS avg_sec,
  	COUNT(*) AS close_events
  FROM Kartographer_16010805
  WHERE event_action = 'close'
  GROUP BY `date`, user_id
) AS per_user_summaries
GROUP BY `date`;
dateshortest durationaverage per-user durationlongest durationunique user tokenstotal close events
2016-11-0300h 00m 05s00h 00m 44s00h 03m 05s56
2016-11-0400h 00m 04s00h 00m 34s00h 02m 02s55
2016-11-0500h 00m 06s00h 01m 34s00h 04m 39s55
2016-11-0600h 00m 15s00h 00m 57s00h 01m 46s44
2016-11-0700h 00m 01s00h 02m 13s00h 11m 03s1011

Do the following counts make sense based on how kartographer is implemented & used?

SELECT
  `has 'view' event(s)`, `has 'open' event(s)`, `has 'hashopen' event(s)`,
  COUNT(*) AS `user tokens`
FROM (
  SELECT
    event_userToken AS user_id,
    IF(SUM(IF(event_action = 'view', 1, 0)) > 0, 'Yes', 'No') AS `has 'view' event(s)`,
    IF(SUM(IF(event_action = 'open', 1, 0)) > 0, 'Yes', 'No') AS `has 'open' event(s)`,
    IF(SUM(IF(event_action = 'hashopen', 1, 0)) > 0, 'Yes', 'No') AS `has 'hashopen' event(s)`,
    COUNT(DISTINCT(event_action)) AS `event types recorded`
  FROM Kartographer_16010805
  GROUP BY user_id
) AS sessions
GROUP BY `has 'view' event(s)`, `has 'open' event(s)`, `has 'hashopen' event(s)`;
has 'view' event(s)has 'open' event(s)has 'hashopen' event(s)user tokens
NoNoNo1
NoNoYes4
NoYesNo29
NoYesYes1
YesNoNo256
YesYesNo8
JGirault added a subscriber: MaxSem.Nov 9 2016, 6:22 PM

@JGirault Am I interpreting the extra field correctly? It looks like {"route":"/map[frame]/XYZ"} implies (to me) that XYZ is the zoom level? But then the numbers I'm seeing are...weird. What is route? Could you please add an explanation to https://meta.wikimedia.org/wiki/Schema:Kartographer?

XYZ is not the zoom level. It is the element index as it appears in the article.
Route of first maplink on the page is: #/maplink/0, second is: #/maplink/1
Route of first map on the page is: #/map/0, second is: #/map/1

In future revision, mobile field is unnecessary because event logging automatically inserts webHost information, so whether the event was made on a mobile site can be determined by checking for presence of ".m." in webHost:

Ah! We probably want to remove this field then, @MaxSem.

Close events seem to be working well! :)

SELECT
  STR_TO_DATE(`date`, '%Y%m%d') AS `date`,
  TIME_FORMAT(SEC_TO_TIME(MIN(min_sec)),'%Hh %im %ss') AS `shortest duration`,
  TIME_FORMAT(SEC_TO_TIME(AVG(avg_sec)),'%Hh %im %ss') AS `average per-user duration`,
  TIME_FORMAT(SEC_TO_TIME(MAX(max_sec)),'%Hh %im %ss') AS `longest duration`,
  COUNT(*) AS `unique user tokens`,
  SUM(close_events) AS `total close events`
FROM (
  SELECT
   LEFT(timestamp, 8) AS `date`,
  	event_userToken as user_id,
  	MIN(event_duration/1000) AS min_sec,
  	MAX(event_duration/1000) AS max_sec,
  	AVG(event_duration/1000) AS avg_sec,
  	COUNT(*) AS close_events
  FROM Kartographer_16010805
  WHERE event_action = 'close'
  GROUP BY `date`, user_id
) AS per_user_summaries
GROUP BY `date`;
dateshortest durationaverage per-user durationlongest durationunique user tokenstotal close events
2016-11-0300h 00m 05s00h 00m 44s00h 03m 05s56
2016-11-0400h 00m 04s00h 00m 34s00h 02m 02s55
2016-11-0500h 00m 06s00h 01m 34s00h 04m 39s55
2016-11-0600h 00m 15s00h 00m 57s00h 01m 46s44
2016-11-0700h 00m 01s00h 02m 13s00h 11m 03s1011

Good

Do the following counts make sense based on how kartographer is implemented & used?

has 'view' event(s)has 'open' event(s)has 'hashopen' event(s)user tokenscomments
NoNoNo1I wonder what is the event...
NoNoYes4Good knowing view event is 1:100
NoYesNo29Good knowing view event is 1:100
NoYesYes1Seems good too
YesNoNo256Looks good
YesYesNo8Looks good

^ comments inline.

debt closed this task as Resolved.Nov 11 2016, 12:19 AM

Resolving - let us know if there are any questions!