Page MenuHomePhabricator

Various sanity checks for data from the rewritten Popups instrumentation
Closed, ResolvedPublic

Description

Some further preliminary sanity checks while we are still waiting for a full week's worth of data from Schema:Popups to become available (T164256, T164256#3264140 , T161769)

  • Histogram of event type frequency over time (like in T139319#2481986 etc.)
  • frequency of editCountBucket values
  • check histogram (and generally, presence) of perceivedWait
  • --> T165461#3292164

Perform sanity checks for Chrome browsers week of 7/10 (restricted to data from after ca. June 27, when T167273: pageLoaded events are logged infrequently, if at all was fixed), comparing results for...

Begin analyzing data from all browsers 7/19 (T158172)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 16 2017, 9:04 AM

Here are histograms of event type frequency over time (data from May 12-15, restricted to ruwiki as the project with the most events), for control and test:



Zooming in vertically, leaving out the peak below 100ms (that we didn't see in the analogous charts for the old instrumentation at T139319#2481986 and T139319#2475143 because dwelledbutAbandoned were not logged for such short times back then):


The graphs look quite noisy because we don't have a lot of data yet, but one can already make some interesting observations (may discuss some with @phuedx & @ovasileva today). Leaving them here without further comment for now, but I would say they look relatively sane on first glance - no obvious anomalies.

Data source: See SWAP notebook (cp ~tbayer/pagepreviews/Popups\ link\ interaction\ timing\ histograms.ipynb .)

The distribution of edit counts among the logged-in users - apparently editors with more than 1000 edits account for more than half of the events. Interesting.

SELECT event_editCountBucket AS editCountBucket, COUNT(*)
FROM log.Popups_16364296
WHERE LEFT(timestamp,6) = '201705'
AND event_isAnon = 0
GROUP BY editCountBucket
ORDER BY editCountBucket;

+-----------------+----------+
| editCountBucket | COUNT(*) |
+-----------------+----------+
| 0 edits         |      758 |
| 1-4 edits       |      604 |
| 100-999 edits   |     1861 |
| 1000+ edits     |     6793 |
| 5-99 edits      |     2022 |
+-----------------+----------+
5 rows in set (7.92 sec)
Tbayer updated the task description. (Show Details)May 16 2017, 9:41 AM
Jdlrobson moved this task from Incoming to 2014-15 Q4 on the Readers-Web-Backlog board.
Jdlrobson added a subscriber: Jdlrobson.

Assuming this is a Tilman specific task.

Tbayer added a comment.EditedMay 25 2017, 2:56 PM

A preliminary histogram of the perceivedWait values (four days of data, ruwiki, anons only, 'dismissed' actions):

This looks plausible so far (the graph starts at 700ms because there were no values below that). There were also no NULL values.

Data via

SELECT 10*FLOOR(event_perceivedWait/10) AS bucket, COUNT(*) AS frequency
FROM log.Popups_16364296
WHERE wiki ='ruwiki' # the wiki with the most events. TODO: extend this to all WPs
AND event_isAnon 
AND DATE(timestamp) >= '2017-05-20'
AND DATE(timestamp) <= '2017-05-24'
AND event_popupEnabled
AND event_linkInteractionToken IS NOT NULL
AND event_action = 'dismissed'
GROUP BY bucket
ORDER BY bucket;

...but it turns out that the perceivedWait value is missing for other kinds of events; now filed as T166323.

Tbayer updated the task description. (Show Details)May 31 2017, 5:32 PM

The number of link interactions per pageview (>3 in both test and control) seems much higher than the corresponding result from old instrumentation (around 1 in T139319#2476014 ) . - This is using the version restricted to >300ms because we made changes on how earlier dwelledButAbandoned events are registered.

(Still looking into this, but leaving it here for the record.)

SELECT event_popupEnabled, 
count(DISTINCT event_linkInteractionToken)/count(DISTINCT event_pageToken) AS link_interactions_per_page
FROM log.Popups_16364296
WHERE wiki ='ruwiki'
AND event_isAnon = 1
AND LEFT(timestamp, 8) >= '20170522'
AND LEFT(timestamp, 8) < '20170530'
AND (event_action = 'pageLoaded' OR event_totalInteractionTime > 300)
GROUP BY event_popupEnabled;

+--------------------+----------------------------+
| event_popupEnabled | link_interactions_per_page |
+--------------------+----------------------------+
|                  0 |                     3.2286 |
|                  1 |                     3.5196 |
+--------------------+----------------------------+
2 rows in set (3.36 sec)

The number of link interactions per pageview (>3 in both test and control) seems much higher than the corresponding result from old instrumentation (around 1 in T139319#2476014 ) .

Probably has to do with T167273: pageLoaded events are logged infrequently, if at all.

ovasileva moved this task from Backlog to Next Up on the Page-Previews board.Jul 5 2017, 12:38 PM
ovasileva triaged this task as High priority.Jul 12 2017, 3:23 PM
ovasileva updated the task description. (Show Details)
ovasileva added a subscriber: MBinder_WMF.

@Tbayer - added the dates and sanity checks we discussed yesterday. Hope they make sense. @MBinder_WMF - tagging this for next sprint for visibility. We can discuss whether that makes sense during kickoff

Tbayer moved this task from Triage to Next Up on the Reading-analysis board.Jul 12 2017, 10:42 PM
Tbayer updated the task description. (Show Details)Jul 15 2017, 2:00 AM
Tbayer updated the task description. (Show Details)Jul 15 2017, 4:30 AM

Posting below the results from four checks added to the task description recently (for Chrome only, and for ruwiki as the project with the most events). As already discussed with @ovasileva , there are no red flags so far, although it looks like there are some interesting differences to the analogous data from the initial 2016 analysis. There are some more observations to be made, but they might fit better in the analysis for the full data later this week. (As a reminder, the final 2016 analysis went a bit further to also look at time series for some of these metrics, which we'll want to do this time too.)

  1. Link interations per page view (cf. result from old instrumentation for huwiki instead of ruwiki)
SELECT event_popupEnabled, 
COUNT(DISTINCT event_linkInteractionToken)/count(DISTINCT event_pageToken) AS link_interactions_per_page
FROM log.Popups_16364296
WHERE wiki ='ruwiki'
AND event_isAnon = 1
AND LEFT(timestamp, 8) >= '20170701'
AND LEFT(timestamp, 8) < '20170715'
AND SUBSTRING( userAgent, INSTR(userAgent,'"browser_family": "')+19, INSTR(SUBSTRING(userAgent, INSTR(userAgent,'"browser_family": "')+19), '"' )-1 ) = 'Chrome'
AND (event_action = 'pageLoaded' OR event_totalInteractionTime > 300)
GROUP BY event_popupEnabled;

+--------------------+----------------------------+
| event_popupEnabled | link_interactions_per_page |
+--------------------+----------------------------+
|                  0 |                     1.3412 |
|                  1 |                     1.5315 |
+--------------------+----------------------------+
2 rows in set (3.90 sec)
  1. Link opens per pageview (cf. result from old instrumentation for huwiki instead of ruwiki):
SELECT event_popupEnabled, 
count(DISTINCT event_linkInteractionToken)/count(DISTINCT event_pageToken) AS link_opens_per_page
FROM log.Popups_16364296
WHERE wiki ='ruwiki'
AND event_isAnon = 1
AND LEFT(timestamp, 8) >= '20170701'
AND LEFT(timestamp, 8) < '20170715'
AND SUBSTRING( userAgent, INSTR(userAgent,'"browser_family": "')+19, INSTR(SUBSTRING(userAgent, INSTR(userAgent,'"browser_family": "')+19), '"' )-1 ) = 'Chrome'
AND (event_action = 'pageLoaded' OR 
  event_action = 'opened')
GROUP BY event_popupEnabled;

+--------------------+---------------------+
| event_popupEnabled | link_opens_per_page |
+--------------------+---------------------+
|                  0 |              0.1775 |
|                  1 |              0.1555 |
+--------------------+---------------------+
2 rows in set (4.81 sec)
  1. ratio of link opens from seen hovercards (cf. result from old instrumentation for huwiki instead of ruwiki):
SELECT SUM(IF(event_action = 'opened',1,0))/SUM(1)
AS clickthrough_ratio
FROM log.Popups_16364296
WHERE wiki ='ruwiki'
AND event_isAnon = 1
AND LEFT(timestamp, 8) >= '20170701'
AND LEFT(timestamp, 8) < '20170715'
AND SUBSTRING( userAgent, INSTR(userAgent,'"browser_family": "')+19, INSTR(SUBSTRING(userAgent, INSTR(userAgent,'"browser_family": "')+19), '"' )-1 ) = 'Chrome'
AND event_totalInteractionTime > event_perceivedWait + 1000 #i.e. card was shown for at least one second
AND event_linkInteractionToken IS NOT NULL;
+--------------------+
| clickthrough_ratio |
+--------------------+
|             0.0647 |
+--------------------+
1 row in set (3.68 sec)
ovasileva closed this task as Resolved.Jul 19 2017, 5:39 PM

closing this based on the above. If we notice something as a part of T158172, we can open separate bugs. @Tbayer - feel free to re-open if you think we're missing something.