Page MenuHomePhabricator

New EventLogging queue doesn't log events in window.unload
Closed, ResolvedPublic

Description

T225578 implemented a queue of events for EventLogging. It also flushes that queue on pagehide. However, unload happens after pagehide, and there's existing logging being performed in unload handlers that's now suppressed by the new queue behavior.

(Arguably, for any browser which supports pagehide we shouldn't be doing logging from unload, given its general unreliability. But not-breaking the current cases seems simpler than finding and rewriting existing logging handlers.)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 575314 had a related patch set uploaded (by DLynch; owner: DLynch):
[mediawiki/extensions/EventLogging@master] Make BackgroundQueue more aware of page unload flow

https://gerrit.wikimedia.org/r/575314

Based on my discussion with @DLynch , I checked the data from the past few months and it looks like the abort_mechanism='navigate' has significantly dropped this month in comparison to Jan 2020 or Dec 2019.
This indicates that the issue may have started during the end of Jan 2020.

Screen Shot 2020-02-27 at 4.33.25 PM.png (422×1 px, 49 KB)

If EventLogging is inconsistent, unreliable, or opaque in its behavior, researchers will lose trust in it, and its net contribution to the MediaWiki ecosystem will be less than zero.

To be trustworthy, its behavior should be consistent and obvious. It wasn't, and now the data is inconsistent.

To fix it, I think it's necessary to find ways to make the code simpler, and I think introducing a non-local state variable is moving in the wrong direction.

I have some vague idea that this could be made better by using a promise instead..

I have some vague idea that this could be made better by using a promise instead..

I don’t think adding something inherently involving an asynchronous resolution phase is going to be simpler in the middle of the unload event. I'm genuinely unsure whether it will allow the next tick to happen so the promise can resolve — and I’d want to subject that to cross-browser testing before I had any sort of confidence in the outcome either way.

I’d not be averse to rolling back the patches that added the entire queue system. It’s very recent, and apparently exists only for performance reasons on mobile — if it introduced problems we can rip it out and design a better solution while it’s not breaking things. 🤷🏻‍♂️

(I mean, obviously I think the proposed patch is a simple and acceptable fix to the problem, but...)

It’s very recent, and apparently exists only for performance reasons on mobile — if it introduced problems we can rip it out and design a better solution while it’s not breaking things.

The perf issues are more prevalent in mobile but really exists anywhere that events are sent in close succession.

Let me try to understand what is happening here (cause we normally advice against adding any instrumentation on the 'unload' event) Is your instrumentation mean to run in desktop and mobile cause it seems that is, in mobile it wouldn't work as unload does not fire reliably.

The existing instrumentation is from VisualEditor, mostly applies to desktop, and has been in place since ~2016.

As I said in the ticket description, I do agree that unload isn’t ideal... but the queue patch did summarily break existing uses of it, and I have no idea how many of those there may be.

The existing instrumentation is from VisualEditor, mostly applies to desktop, and has been in place since ~2016.

And, given that times have changed since 2016 could the events possibly be attached to a more "modern" event from the page visibility API?

I have no idea how many of those there may be.

Probably not many, as most instrumentation needs to work in both mobile and desktop and it is been years that we have been advising against instrumenting code against the 'unload' event.

And, given that times have changed since 2016 could the events possibly be attached to a more "modern" event from the page visibility API?

Plausibly! I’m not opposed, outside of needing to do some research to double check what changes would be needed. We recently dropped IE10 support, which I gather opens up the visibility events to us...

That said, this queue code still needs to be fixed in a very similar way to this patch, since if we moved to pagehide then it’d still leave us at the mercy of event-registration order as to whether or not the queue got flushed before the unload occurred.

If EventLogging takes a stance of unload events being officially not-supported, we could just duplicate the existing binding of queue.flush to pagehide onto unload as well, to catch any stragglers. It’d resolve Ori’s complaint about needing a global variable, also.

(On the global variable front, I did consider document.unloaded, which apparently has the same meaning... but I found spotty reports on browser support and didn’t want to do extensive testing.)

The perf issues are more prevalent in mobile but really exists anywhere that events are sent in close succession.

Are there any measurements to support this?

Are there any measurements to support this?

It is not perf issues with our code but rather issues with waking up radio as 5 events are sent every second instead of sending 25 every 5 seconds, hopefully that makes sese, The motivation to want this case to be as performant as possible is to be able to use methodology like this one to calculate a privacy sensitive session length metric that does not require identifiers: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/SessionLength

Milimetric triaged this task as High priority.
Milimetric moved this task from Incoming to Event Platform on the Analytics board.
Milimetric added a project: Analytics-Kanban.
Milimetric moved this task from Next Up to In Progress on the Analytics-Kanban board.

Change 575314 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@master] Make BackgroundQueue more aware of page unload flow

https://gerrit.wikimedia.org/r/575314

Change 578317 had a related patch set uploaded (by Bartosz Dziewoński; owner: DLynch):
[mediawiki/extensions/EventLogging@wmf/1.35.0-wmf.22] Make BackgroundQueue more aware of page unload flow

https://gerrit.wikimedia.org/r/578317

Change 578317 merged by jenkins-bot:
[mediawiki/extensions/EventLogging@wmf/1.35.0-wmf.22] Make BackgroundQueue more aware of page unload flow

https://gerrit.wikimedia.org/r/578317

Mentioned in SAL (#wikimedia-operations) [2020-03-10T11:40:35Z] <lucaswerkmeister-wmde@deploy1001> Synchronized php-1.35.0-wmf.22/extensions/EventLogging/: SWAT: [[gerrit:578317|Make BackgroundQueue more aware of page unload flow (T246382, T244874)]] (duration: 00m 58s)