Page MenuHomePhabricator

EventLogging background queue beforeunload event handler blocks Back-Forward cache
Closed, ResolvedPublic

Description

The Back-Forward cache is a caching system in firefox that runs when the back button is clicked. It will then simply use the DOM from the previous page that is in it's cache instead of reloading the entire page (and re-requesting files).

Back-Forward cache for Chrome is currently in the works and while testing it on Wikipedia, Google's Kenji Baheux reported to us that his team discovered that Wikipedia has a beforeunload event handler on every page (from EventLogging). Having such unload event handlers prevents Back-Forward cache from working. Here's a PDF of the report with repro steps: https://drive.google.com/file/d/1zCoKYwGjM_JN80bzFmU9VwFayyuy5lQo/view?usp=sharing

This handler was introduced by @DLynch in T246382: New EventLogging queue doesn't log events in window.unload as a way to ensure that EventLogging events emitted by existing beforeunload handlers would be reported back as expected by the new EventLogging BackgroundQueue.

Back-Forward cache is a powerful browser feature that our users would probably benefit a lot from in common Wikipedia browsing patterns. Since beforeunload handlers are a bad practice in general, we should probably phase them out.

@DLynch is the instrumentation that prompted you to add this handler in EventLogging's BackgroundQueue still being collected? If so, is there any alternative to them using a beforeunload handler?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

That instrumentation is still being collected -- my specific case to be supported was logging aborted edits, which has ongoing analysis.

Removing the global unload handler entirely in EventLogging *should* only affect browsers that don't support pagehide. Compatibility tables suggest this means IE9-10, from our "basic" browser-support tier. I think that's all probably fine (being a tiny part of our traffic), but we might want to run it by product/analytics to make sure there aren't objections? (Or at least to verify my belief it's a tiny part of the analyzed traffic.)

If we did care about those older browsers, we could do some browser-sniffing for them and either trip the BackgroundQueue into instant-logging mode for them 100% of the time, or only register an unload handler for them.

There's then unload handlers that're sending the events EventLogging is supporting. They're situational, and so not blocking every page indiscriminately, but we should really migrate them to something else like pagehide. Same compatibility issues apply. Main reason we didn't do this at the time was that EventLogging had implemented its new BackgroundQueue which broke existing onunload logging, and we wanted a quick fix for that.

I think that only registering the unload handler for browsers that don't support pagehide makes sense. Those older browsers don't support bfcache anyway.

I think the long tail of other handlers can be dealt with a lower priority. EventLogging's is on every page.

I think (?) it should be as simple as

if ( !( 'onpagehide' in window ) ) {
    window.addEventListener( 'unload', discardPage );
}

Granted, that'd fail to register it if someone has outright done window.onpagehide = ... somewhere. But I doubt we'd do that rather than addeventlistener -- I checked a regular article page and it's window.onpagehide is null, as it should be.

Indeed, it's that simple. However, @Krinkle reminded me that our JS-enabled experience now excludes IE9-10: https://www.mediawiki.org/wiki/Compatibility#Browser_support_matrix Meaning that we don't need an unload event handler at all. All grade A browsers support pagehide/pageshow according to https://caniuse.com/page-transition-events

Oh yeah, if we're already not even loading that code because of the sniffing, remove away!

You just need to make sure that any code you rely on that creates EventLogging events at the end of the page lifecycle do so with pagehide/pageshow and not unload/onbeforeunload. Then we should be able to safely remove the unload handler in EventLogging's BackgroundQueue.

Since we can 100% count on the pagehide hook running, we can just remove that handler right now.

The handler bound to pagehide and unload right now does two things: flushes the queue, and sets a state flag that makes all future attempts to add to the queue get immediately-flushed (because even with the unload handler, we can't rely on BackgroundQueue's running last). As pagehide is always fired before unload, any browser that's currently being served that file should be basically doing a no-op when it gets to the unload handler, since the queue should have been flushed and nothing more added.

Change 700886 had a related patch set uploaded (by Gilles; author: Gilles):

[mediawiki/extensions/EventLogging@master] Remove unload handler

https://gerrit.wikimedia.org/r/700886

[…] As pagehide is always fired before unload, any browser that's currently being served that file should be basically doing a no-op when it gets to the unload handler, […]

Yes – unless there is code adding events from an unload handler itself, and/or otherwise after or asynchronously from a pagehide event. That may be worth doing a codeseach for.

You just need to make sure that any code you rely on that creates EventLogging events at the end of the page lifecycle do so with pagehide/pageshow and not unload/onbeforeunload. […]

Yes – unless there is code adding events from an unload handler itself, and/or otherwise after or asynchronously from a pagehide event. That may be worth doing a codeseach for.

No, the BackgroundQueue pagehide handler having run will mean there's literally nothing for the BackgroundQueue unload handler to do. Nothing further can be queued, so anything in post-BackgroundQueue-pagehide / other unload handlers will just have been instantly dispatched.

I believe we're referring to scenarios like VisualEditor or NavigationTiming having instrumentation that listens to onbeforeunload and queues an event from there (e.g. something for abandoned edits, load time of pages closed before it completely loading). If such instrumentation (still/wrongly) queues its event from unbeforeunload (instead of pagehide) then, afaik, it can't have been submitted to the server by EventLogging's pagehide handler because in the browser's Page Lifecycle, pagehide fires before unload. And as such, removing unload handling from EventLogging before such code is migrated first, would result in us (further) breaking that instrumentation, in addition to it already not working well on mobile.

@Krinkle I think we're talking in circles here, as that's the scenario I'm referencing as well? Anyway, it won't break anything, because anything added to the queue after pagehide gets immediately submitted rather than being queued -- see line 83 of BackgroundQueue.js where it sets some state that then changes how queue.add behaves. Thus removing the unload handler shouldn't actually change anything, since nothing should have still been in the queue at the time the unload handler ran.

odimitrijevic subscribed.

Olja to follow up on work related to event-logging extension

Change 700886 merged by jenkins-bot:

[mediawiki/extensions/EventLogging@master] Remove unload handler

https://gerrit.wikimedia.org/r/700886

DLynch reassigned this task from odimitrijevic to Gilles.