Page MenuHomePhabricator

EventLogging needs to enque events to avoid draining users' battery on mobile
Open, HighPublic8 Story Points

Description

Similar to how it is done in the statsv client we should enque EL events so we are not sending frequent beacons. See statsv code: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikimediaEvents/+/500046/

This change is important in the light of baseline metrics like session length that might be sending pings every N seconds.
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/SessionLength

Note that we are not talking about batching events but rather enqueuing them, they are not batched into a single network call, but rather the network calls happen around the same time. This improves battery usage on mobile as there is no need to wake up the radio (expensive) every few seconds to send analytics pings.

Event Timeline

Nuria created this task.Jun 11 2019, 10:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 11 2019, 10:42 PM
fdans triaged this task as High priority.Jun 13 2019, 5:00 PM
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Nuria updated the task description. (Show Details)Jun 13 2019, 5:20 PM
Gilles moved this task from Inbox to Radar on the Performance-Team board.Jun 17 2019, 8:15 PM
Gilles edited projects, added Performance-Team (Radar); removed Performance-Team.
Nuria assigned this task to Milimetric.Jul 5 2019, 5:22 PM
Nuria added a project: Analytics-Kanban.
Nuria added a subscriber: Gilles.

Ping @Gilles: work on this to start second week of July

@Krinkle so I'm looking at this and feeling a little uncomfortable about basically duplicating the logic from WikimediaEvents.

What do you think about an EventQueue class in the eventLogging core module? It would do common things (managing the timer, schedule, and handle pagehide/visibilitychange). You would pass logic to it for dispatch. So we'd refactor WikimediaEvents to use it and also use it for Event Logging to address this task here.

Yes, a common interface for this would make sense. I'd go further and actually also make it a singleton so as to not have multiple instances tracking and reflecting the same in-browser state. Probably something generic callback-based, like (mw.)requestIdleCallback and (window.)requestAnimationCallback do already.

Hm, if it's a singleton then clients would have to namespace events. Because the statsv client wants to batch events and send them together while the EventLogging client just wants to send them one at a time. So dispatch would be:

dispatch ( namespace ) {
    // get events in { namespace } as a list
    // call a specific callback like { callbacks[ namespace ] } with the list
    // get back a list of URLs
    // loop over the list and call sendBeacon
}

That would allow the statsv client to squish the list of events into a list of one URL, and the EventLogging client to $.map it into another list.

I think it would be cleaner to have separate instances than have namespaces.

Nuria added a comment.Jul 17 2019, 6:06 PM

If it is a singleton there would be 1 queue but in this case the statsd queue and EL queue are different (and so is the dispatch method for either) so the queue management and dispatch would need to happen outside the singleton so it ends up being a bit of fake-singleton no? (totally could be missing some well -stablished mw conventions here)

Change 524575 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[mediawiki/extensions/EventLogging@master] [WIP] Refactor queue/batching of events from statsv to eventLog module.

https://gerrit.wikimedia.org/r/524575

Change 524576 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[mediawiki/extensions/WikimediaEvents@master] [WIP] Refactoring queue/batching of events to eventLog module

https://gerrit.wikimedia.org/r/524576

If it is a singleton there would be 1 queue but in this case the statsd queue and EL queue are different (and so is the dispatch method for either) [..]

My thinking was that this mechanism is more low level than that. Purely callback based, like setTimeout or requestIdleCallback. E.g. a callback that (based on current logic) is debounced by 2 seconds, but resolves earlier based on certain events. What you then do with the callback is the caller's decision. So statsd would indeed still have its own array of metrics, and EventLogging core its array of beacon urls. The callback is then to prompt the dispatching call in whatever each each handles that.

Nuria set the point value for this task to 8.Jul 22 2019, 8:48 PM
phuedx added a subscriber: phuedx.Mon, Sep 2, 2:51 PM

Is there any production instrumentation that relies on the time at which the beacon request arrived at the edge (the dt field in the Hive event.* tables IIRC)? My apologies if you've already looked into this.

Nuria added a comment.Mon, Sep 2, 3:00 PM

@phuedx
no, there shouldn't be anything relying on that field

Nuria moved this task from In Code Review to Paused on the Analytics-Kanban board.Tue, Sep 17, 4:05 PM