Page MenuHomePhabricator

Set up metrics for Time on Site
Open, Needs TriagePublic

Description

The way users interact with our site is increasingly diverging from the classical "page view" model. As an example, hovercards and similar functionality in the apps is actually showing a good part of a linked page in a preview, but using this preview does not result in a navigation away from the original page. Similarly, MediaViewer is showing images and information very similar to image pages, without loading the image description page. Videos and interactive content can engage users deeply, but again don't cause navigation events that would count as a "page view".

To side-step fruitless discussions about which events should count as a "view", I think that we should add a new metric to better capture page view and non-pageview interactions likewise: Time on Site.

I think we can collect such information in anonymous form, without a need for session tracking. A recent article by Ilya Grigorik shows how to use the W3C visibility API to fire events when a page is hidden by navigating away, switching tabs or apps. By hooking into those events, we should be able to send a small beacon request containing the time spent between page load and -hide.

Event Timeline

GWicke updated the task description. (Show Details)
GWicke raised the priority of this task from to Needs Triage.
GWicke added subscribers: GWicke, ori, Krinkle and 5 others.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 23 2015, 12:13 AM
GWicke set Security to None.Nov 23 2015, 12:13 AM
GWicke added a subscriber: dr0ptp4kt.
Deskana added a subscriber: Deskana.Dec 3 2015, 7:24 PM

Discovery is already recording some survival time data: http://discovery.wmflabs.org/metrics/#survival

@Deskana oooh! this is excellent. Is this sitewide or limited to specific pages/actions? Is it session length or how long a user stays on the page?

@Deskana oooh! this is excellent. Is this sitewide or limited to specific pages/actions?

The population that this data is recorded for is everyone who arrives at an article from an internal search, and then there's random sampling on top of that. So, no, it's not limited to specific pages, but it is limited to people that arrive at pages in a particular way. Whether or not that biases the application of this data for other purposes will depend on what you use it for. This should really be written on the dashboard itself.

Is it session length or how long a user stays on the page?

How long the user stays on that particular page.

leila added a subscriber: leila.Jul 1 2016, 4:42 PM

FWIW, Mikhail helpfully gave me some background recently on how the instrumentation for http://discovery.wmflabs.org/metrics/#survival works
(it is based on JavaScript-generated checkin events fired at certain intervals).

@Deskana/ @Tbayer / @EBernhardson

Idea is good and similar to our suggestions about computing retention before.

Some issues that I see:

  1. note that code is sending events even when the tab is not visible, correct? That means you are counting as "visible-time" time that might not be.

You cannot tell on all browsers but in the majority you actually can tell whether the ui is on display: http://caniuse.com/#feat=pagevisibility
This is kind of an important point for this metric.

2)>function randomToken() {

		return mw.user.generateRandomSessionId() + new Date().getTime().toString( 36 );

}

I do not think this is needed as the random session is sufficiently unique in our user space.

  1. Local storage is not encrypted so it is best not to store private information there. Also, in the case of several tabs (like erik noted) results are not going to be obvious. I agree that code/locking is overkill though
  1. local storage is synchronous so any put/gets affect your UI thread, on desktop likely not an issue but i bet it might be noticeable on mobile.

@Deskana/ @Tbayer / @EBernhardson

Idea is good and similar to our suggestions about computing retention before.

Some issues that I see:

  1. note that code is sending events even when the tab is not visible, correct? That means you are counting as "visible-time" time that might not be. You cannot tell on all browsers but in the majority you actually can tell whether the ui is on display: http://caniuse.com/#feat=pagevisibility This is kind of an important point for this metric.

Interesting, i didn't know about this (although i suppose i have seen sites like youtube not start a video until you tab in). Created a new ticket T145102 to implement this handling.

  1. function randomToken() { return mw.user.generateRandomSessionId() + new Date().getTime().toString( 36 ); }

    I do not think this is needed as the random session is sufficiently unique in our user space.

This was added quite awhile ago, the issue at the time was some duplication of data. In a check of the last week though, distinct(event_searchSessionId) and distinct(left(event_searchSessionId, 16)) have the same counts so perhaps this is no longer an issue and the timestamp can be removed.

  1. Local storage is not encrypted so it is best not to store private information there. Also, in the case of several tabs (like erik noted) results are not going to be obvious. I agree that code/locking is overkill though

Unfortunately local storage is the only available option here though, and the loss of events was noticable for those that are fired close to page unload, specifically events fired from the autocomplete where someone types and presses enter (which is quite common). Due to the limited timeframe that events are added to the queue (only those active on unload), and the most likely cause of unload when events are fired, these should have an incredibly short lifetime, but indeed is still worth keeping in mind.

  1. local storage is synchronous so any put/gets affect your UI thread, on desktop likely not an issue but i bet it might be noticeable on mobile.

Indeed, but there aren't many better options. Could perhaps spend some time looking into how often we hit localstorage and optimize where possible.

dr0ptp4kt moved this task from Backlog to Tracking on the Reading-Admin board.Jul 20 2017, 9:42 PM

@Tbayer Should this task remain open, and if so, what are the criteria for its completion? It seems related to T174512, T145388 and others. Some of this this might be a duplicate, sub or parent task.