Page MenuHomePhabricator

Augment ReadingDepth schema with data from Page Lifecycle API
Closed, DeclinedPublic5 Estimated Story Points

Description

Add a new field lifecycleActiveLength field to the ReadingDepth schema that measures how long a page was in the "active" state as per the Page Lifecyle API (https://developers.google.com/web/updates/2018/07/page-lifecycle-api#overview_of_page_lifecycle_states_and_events ).

This should probably be sent with pageUnloaded events (in addition to visibleLength, which will still be based on the older, more limited Page Visibility API, and totalLength), on browsers that support the Page Lifecyle API; on Chrome that's version 68 and newer. Or maybe a new, separate pageTerminated event (tied to the corresponding state transition per the Lifecycle API) would be preferable - developer input on this would be valuable.

Background

A limitation of the current ReadingDepth data is that the pageUnloaded event may not be fired if the page is terminated by the browser/system before being unloaded. On mobile, this actually happens in a majority of cases and biases our data especially when comparing reading times on desktop and mobile ( https://meta.wikimedia.org/wiki/Research_talk:Reading_time/Work_log/2018-11-02 ).

@Krinkle pointed out that the Page Lifecycle API should help with this. See also the spec at https://wicg.github.io/page-lifecycle/spec.html

AC

  • The pageUnloaded EL event is logged when the pagehide event fires
  • When the page becomes hidden (see the Page Visibility API), the state of the ReadingDepth instrumentation is persisted to session storage
  • If a ReadingDepth event is logged during the lifecycle of the page, session storage is cleared
  • When the page loads, any state persisted is reconstructed into a ReadingDepth pageUnloaded event and the event is logged
  • We time how long the page is in the ACTIVE state
  • The time is sent as part of a ReadingDepth pageUnloaded event

Notes

  1. Logging the pageUnloaded EL event when the pagehide event fires won't address the holes in the data that we're seeing (see Background). It's a minor performance improvement, that'll allow pages with the ReadingDepth enabled to be persisted in the browser's back/forward cache
  2. The HIDDEN -> freeze -> FROZEN -> DISCARDED state transition (see https://developers.google.com/web/updates/images/2018/07/page-lifecycle-api-state-event-flow.png) appears to be causing the holes in the data that we're seeing. Since we can't detect when a page is discarded, only when it has been discarded and then loaded at some future time, it makes sense to persist state until we can reliably send it.
  3. There's a well-defined pattern that we use to time how long the page is visible defined in https://github.com/wikimedia/mediawiki-extensions-WikimediaEvents/blob/f39438dfb9163cf72ca82dd9fb345b575dfcf373/modules/all/ext.wikimediaEvents.readingDepth.js. We can re-use this pattern and exchange document.addEventListener( 'focus', ... ) and .addEventListener( 'blur', ... ) where we would use document.addEventListener( 'visibilitychange', ... )

Event Timeline

ovasileva moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.
Tbayer renamed this task from Augment ReadingDepth schema with to Augment ReadingDepth schema with data from Page Lifecycle API.Mar 25 2019, 7:41 PM
  • Can lifecycleactiveLength be lifecycleActiveLength?
  • It's my understanding that this change is anticipated to be supported by Chrome only initially. I think that means we have to maintain the existing implementation but I wasn't quite clear from the meeting discussion.
  • Does ResourceLoader instrumentation JavaScript loading order matter for these metrics?
  • Is this just adding a single event listener? What is the scope of the actual changes we want to make here?
  • Can lifecycleactiveLength be lifecycleActiveLength?

Sounds good, I renamed it in the task description.

  • It's my understanding that this change is anticipated to be supported by Chrome only initially. I think that means we have to maintain the existing implementation but I wasn't quite clear from the meeting discussion.

Yes, that's why the task decription said "in addition to visibleLength" (which will remain based on the Visibility API).

Background/motivation: A limitation of the current ReadingDepth data is that the pageUnloaded event may not be fired if the page is terminated by the browser/system before unload. On mobile, this actually happens in a majority of cases and biases our data especially when comparing reading times on desktop and mobile ( https://meta.wikimedia.org/wiki/Research_talk:Reading_time/Work_log/2018-11-02 ).

Indeed, Google's documentation cautions against using the "legacy" unload event (see https://developers.google.com/web/updates/2018/07/page-lifecycle-api#legacy-lifecycle-apis-to-avoid):

Many developers treat the unload event as a guaranteed callback and use it as an end-of-session signal to save state and send analytics data, but doing this is extremely unreliable, especially on mobile! The unload event does not fire in many typical unload situations, including closing a tab from the tab switcher on mobile or closing the browser app from the app switcher.

Furthermore, the mere presence of a registered unload event handler (via either onunload or addEventListener()) can prevent browsers from being able to put pages in the page navigation cache for faster back and forward loads.

The documentation recommends that we use the pagehide event wherever possible:


function onUnload( /* ... */ ) {
}

if ( window.hasOwnProperty( 'onpagehide' ) ) {
    $( window ).on( 'pagehide', ( e ) => {
      // The page isn't going to be persisted to the browser's BFCache and is about to be unloaded.
      if ( !e.persisted ) {
        onUnload( e );
      }
    } );
} else {
    $( window ).on( 'unload', onUnload );
}

This change should fix the issue identified in that work log.

☝️ If necessary, we could break out migrating from using the unload event to using the pagehide event into another task.

☝️ If necessary, we could break out migrating from using the unload event to using the pagehide event into another task.

Yes, great point! I'm still slightly wondering about compatibility/comparability though - is it correct to interpret the documentation as saying that in cases where the unload event fires correctly, it should coincide with the pagehide event (with terminated state)?

@phuedx feel free to move this to triaged but future once some developer notes, AC and QA steps have been added!

Yes, great point! I'm still slightly wondering about compatibility/comparability though - is it correct to interpret the documentation as saying that in cases where the unload event fires correctly, it should coincide with the pagehide event (with terminated state)?

AIUI yes (except I believe a pagehide event with persisted = false would mean that an unload event should follow).

phuedx removed phuedx as the assignee of this task.Apr 4 2019, 2:58 PM
phuedx updated the task description. (Show Details)
phuedx moved this task from Needs Prioritization to Triaged but Future on the Web-Team-Backlog board.
Jdlrobson lowered the priority of this task from High to Low.Apr 9 2019, 4:14 PM
This comment was removed by Jdlrobson.
Jdlrobson raised the priority of this task from Low to High.Apr 9 2019, 4:14 PM
ovasileva set the point value for this task to 5.Apr 9 2019, 4:30 PM

Additional notes that I raised during today's Backlog Grooming meeting:

  • Page Lifecycle API - HTTP203: https://www.youtube.com/watch?v=UlLQPguE7UQ
  • According to the above, session storage is persisted when the page is discarded
  • If the browser supports the frozen event (i.e. onfreeze in document), then it supports session storage
phuedx lowered the priority of this task from High to Medium.Apr 24 2019, 4:27 PM

I'll follow up with Kate Zimmerman and Megan Neisler about how the ReadingDepth instrumentation fits in with their priorities.

This doesn't need analysis it just needs prioritization