Page MenuHomePhabricator

Augment ReadingDepth schema with data from Page Lifecycle API
Open, NormalPublic5 Story Points

Description

Add a new field lifecycleActiveLength field to the ReadingDepth schema that measures how long a page was in the "active" state as per the Page Lifecyle API (https://developers.google.com/web/updates/2018/07/page-lifecycle-api#overview_of_page_lifecycle_states_and_events ).

This should probably be sent with pageUnloaded events (in addition to visibleLength, which will still be based on the older, more limited Page Visibility API, and totalLength), on browsers that support the Page Lifecyle API; on Chrome that's version 68 and newer. Or maybe a new, separate pageTerminated event (tied to the corresponding state transition per the Lifecycle API) would be preferable - developer input on this would be valuable.

Background

A limitation of the current ReadingDepth data is that the pageUnloaded event may not be fired if the page is terminated by the browser/system before being unloaded. On mobile, this actually happens in a majority of cases and biases our data especially when comparing reading times on desktop and mobile ( https://meta.wikimedia.org/wiki/Research_talk:Reading_time/Work_log/2018-11-02 ).

@Krinkle pointed out that the Page Lifecycle API should help with this. See also the spec at https://wicg.github.io/page-lifecycle/spec.html

AC

  • The pageUnloaded EL event is logged when the pagehide event fires
  • When the page becomes hidden (see the Page Visibility API), the state of the ReadingDepth instrumentation is persisted to session storage
  • If a ReadingDepth event is logged during the lifecycle of the page, session storage is cleared
  • When the page loads, any state persisted is reconstructed into a ReadingDepth pageUnloaded event and the event is logged
  • We time how long the page is in the ACTIVE state
  • The time is sent as part of a ReadingDepth pageUnloaded event

Notes

  1. Logging the pageUnloaded EL event when the pagehide event fires won't address the holes in the data that we're seeing (see Background). It's a minor performance improvement, that'll allow pages with the ReadingDepth enabled to be persisted in the browser's back/forward cache
  2. The HIDDEN -> freeze -> FROZEN -> DISCARDED state transition (see https://developers.google.com/web/updates/images/2018/07/page-lifecycle-api-state-event-flow.png) appears to be causing the holes in the data that we're seeing. Since we can't detect when a page is discarded, only when it has been discarded and then loaded at some future time, it makes sense to persist state until we can reliably send it.
  3. There's a well-defined pattern that we use to time how long the page is visible defined in https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/f39438dfb9163cf72ca82dd9fb345b575dfcf373/modules/all/ext.wikimediaEvents.readingDepth.js. We can re-use this pattern and exchange document.addEventListener( 'focus', ... ) and .addEventListener( 'blur', ... ) where we would use document.addEventListener( 'visibilitychange', ... )

Event Timeline

ovasileva triaged this task as High priority.Mar 25 2019, 7:03 PM
ovasileva moved this task from To Triage to Needs Analysis on the Readers-Web-Backlog board.
Tbayer renamed this task from Augment ReadingDepth schema with to Augment ReadingDepth schema with data from Page Lifecycle API.Mar 25 2019, 7:41 PM
  • Can lifecycleactiveLength be lifecycleActiveLength?
  • It's my understanding that this change is anticipated to be supported by Chrome only initially. I think that means we have to maintain the existing implementation but I wasn't quite clear from the meeting discussion.
  • Does ResourceLoader instrumentation JavaScript loading order matter for these metrics?
  • Is this just adding a single event listener? What is the scope of the actual changes we want to make here?
Tbayer updated the task description. (Show Details)Mar 26 2019, 6:53 PM
  • Can lifecycleactiveLength be lifecycleActiveLength?

Sounds good, I renamed it in the task description.

  • It's my understanding that this change is anticipated to be supported by Chrome only initially. I think that means we have to maintain the existing implementation but I wasn't quite clear from the meeting discussion.

Yes, that's why the task decription said "in addition to visibleLength" (which will remain based on the Visibility API).

phuedx added a subscriber: phuedx.EditedMar 28 2019, 4:28 PM

Background/motivation: A limitation of the current ReadingDepth data is that the pageUnloaded event may not be fired if the page is terminated by the browser/system before unload. On mobile, this actually happens in a majority of cases and biases our data especially when comparing reading times on desktop and mobile ( https://meta.wikimedia.org/wiki/Research_talk:Reading_time/Work_log/2018-11-02 ).

Indeed, Google's documentation cautions against using the "legacy" unload event (see https://developers.google.com/web/updates/2018/07/page-lifecycle-api#legacy-lifecycle-apis-to-avoid):

Many developers treat the unload event as a guaranteed callback and use it as an end-of-session signal to save state and send analytics data, but doing this is extremely unreliable, especially on mobile! The unload event does not fire in many typical unload situations, including closing a tab from the tab switcher on mobile or closing the browser app from the app switcher.

Furthermore, the mere presence of a registered unload event handler (via either onunload or addEventListener()) can prevent browsers from being able to put pages in the page navigation cache for faster back and forward loads.

The documentation recommends that we use the pagehide event wherever possible:


function onUnload( /* ... */ ) {
}

if ( window.hasOwnProperty( 'onpagehide' ) ) {
    $( window ).on( 'pagehide', ( e ) => {
      // The page isn't going to be persisted to the browser's BFCache and is about to be unloaded.
      if ( !e.persisted ) {
        onUnload( e );
      }
    } );
} else {
    $( window ).on( 'unload', onUnload );
}

This change should fix the issue identified in that work log.

☝️ If necessary, we could break out migrating from using the unload event to using the pagehide event into another task.

phuedx claimed this task.Mar 28 2019, 5:42 PM

☝️ If necessary, we could break out migrating from using the unload event to using the pagehide event into another task.

Yes, great point! I'm still slightly wondering about compatibility/comparability though - is it correct to interpret the documentation as saying that in cases where the unload event fires correctly, it should coincide with the pagehide event (with terminated state)?

@phuedx feel free to move this to triaged but future once some developer notes, AC and QA steps have been added!

Yes, great point! I'm still slightly wondering about compatibility/comparability though - is it correct to interpret the documentation as saying that in cases where the unload event fires correctly, it should coincide with the pagehide event (with terminated state)?

AIUI yes (except I believe a pagehide event with persisted = false would mean that an unload event should follow).

phuedx updated the task description. (Show Details)Apr 4 2019, 2:50 PM
phuedx removed phuedx as the assignee of this task.Apr 4 2019, 2:58 PM
phuedx updated the task description. (Show Details)
phuedx moved this task from Needs Analysis to Triaged but Future on the Readers-Web-Backlog board.
Jdlrobson lowered the priority of this task from High to Low.Apr 9 2019, 4:14 PM
This comment was removed by Jdlrobson.
Jdlrobson raised the priority of this task from Low to High.Apr 9 2019, 4:14 PM
ovasileva set the point value for this task to 5.Apr 9 2019, 4:30 PM
phuedx added a comment.Apr 9 2019, 4:32 PM

Additional notes that I raised during today's Backlog Grooming meeting:

  • Page Lifecycle API - HTTP203: https://www.youtube.com/watch?v=UlLQPguE7UQ
  • According to the above, session storage is persisted when the page is discarded
  • If the browser supports the frozen event (i.e. onfreeze in document), then it supports session storage
phuedx lowered the priority of this task from High to Normal.Apr 24 2019, 4:27 PM

I'll follow up with Kate Zimmerman and Megan Neisler about how the ReadingDepth instrumentation fits in with their priorities.

This doesn't need analysis it just needs prioritization