Page MenuHomePhabricator

Send batches of events from EPC app libraries (Java, Swift)
Closed, ResolvedPublic

Description

Is /v1/events plural with the intention that eventually EventGate will support batches of events in the same request?

It does support that, just POST an array.

It's really cool that this capability technically exists! This would be particularly useful for

  • mobile clients where it could mean waking up the radio fewer times, and
  • onUnload where we might want to flush the queue of logged events in a single go to make the page unload faster, rather than navigator.sendBeacon one at a time which negatively impacts next page load

We should look deeper into it, make sure we have an OK from Analytics to utilize it and that the pipeline is fully equipped to handle batches, and figure out an optimal batch size. If we end up doing this we want to get the benefit of bundling up several events into one request, but we wouldn't want to make the payload too heavy for cellular networks and slow connections in general.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
Resolved Mholloway
OpenNone
OpenNone
Resolved Mholloway
OpenNone
Resolvedmpopov
ResolvedTsevener
ResolvedTsevener
Resolvedmpopov
ResolvedBUG REPORT Mholloway
ResolvedSNowick_WMF
Resolved Mholloway
ResolvedSpike Mholloway
Resolved Mholloway
Resolvedphuedx
Resolvedcjming
Resolved Mholloway
Resolvedcjming
Resolvedcjming
Resolvedcjming
Resolvedcjming

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
mpopov added a subscriber: Nuria.

@Nuria: would love to get your thoughts on this

@mpopov there are two things here:

  • eventgate adds the ability to put events on an array, that, per se, has no perf benefit unless a mechanism to enque the events exists in the client cause you can send an array of events 3 times in one second which will overuse the raido, makes sense?

In the eventgate pipeline the events in the array are unbundled and the pipeline can handle that w/o issues.

@Nuria: By batching I meant an array of events sent in a single request as one payload – as one batch of events. You're referring to a "burst" behavior where events are enqueued and then the queue is flushed, rapidly sending one event after another as separate requests. The patch you linked to – as I understand it and the way we've done it with EPC – is that rapid-fire, "burst" behavior.

In some situations, there might be a benefit of POSTing 1 big request containing an array (batch) of, say, 10 events, than individually POSTing 10 small requests in a rapid succession.

In some situations, there might be a benefit of POSTing 1 big request containing an array (batch) of, say, 10 events, than individually POSTing 10 small requests in a rapid succession.

In some maybe, in the majority of them, I doubt it. The bulk of your time and battery consumption in mobile goes into waking up the radio (as you mentioned earlier) which you have to do in either case. So the critical perf improvements (when it comes to the battery) are gained from walking up the radio less frequently.

rather than navigator.sendBeacon one at a time which negatively impacts next page load

This statement is , I think, incorrect. The SendBeacon API was designed for this use case in mind, to not affect the next page navigation. It comes to solve that very problem:
"By using the sendBeacon() method, the data is transmitted asynchronously to the web server when the User Agent has an opportunity to do so, without delaying the unload or affecting the performance of the next navigation". See: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon

an optimal batch size

The current (adjustable) limits are independent of the batch array length, but are instead about byte length of individual
messages, and also the total POST body byte length.

From https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/services/eqiad/eventgate-logging-external/values.yaml#L3-L8

    # The request body is accepted as an array of events, each of which
    # will be an individual message in Kafka.  Each individual
    # message must be smaller than message.max.bytes, but EventGate
    # can accept multiple events at once in the request body.
    # Limit this to a smaller size for this externally accessible eventgate instance.
    max_body_size: 4mb
# ...
        # Enforce a smaller message size limit for this externally accessible eventgate instance.
        message.max.bytes: 1048576

So with current settings, each message must be smaller than 1MB, and the entire POST body must be smaller than 4MB.

So the critical perf improvements (when it comes to the battery) are gained from walking up the radio less frequently.

@Nuria even if this is true, is there a reason not to batch events together?

fdans moved this task from Event Platform to Radar on the Analytics board.

@Nuria even if this is true, is there a reason not to batch events together?

That is fine and dandy but the main argument to be made is, I think, not a performance one.

mpopov triaged this task as Lowest priority.Dec 11 2019, 5:02 PM

Marking as lowest priority since this is not critical to any work or blocking anything. It does feel like there would be some benefit, but proper research and experimentation is needed.

Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:33 AM
Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:36 AM
DAbad raised the priority of this task from Lowest to Medium.
DAbad moved this task from Backlog to Ready/Groomed on the Metrics Platform Backlog board.
DAbad subscribed.

This is work that we can do. Prioritizing to complete prior to release of MVP.

TODO: Verify that this is indeed happening in all consolidated libraries.

Mholloway renamed this task from Explore sending batches of events from EPC libraries to Explore sending batches of events from EPC app libraries (Java, Swift).Aug 2 2021, 4:31 PM

This task applies specifically to the Android and iOS (Java and Swift) libraries, so I've updated the title accordingly. It doesn't apply to PHP, and is handled in MediaWiki JS by the existing EventLogging modules.

This will depend on the integration work, so I'll add the Android and iOS MP client integration tasks as subtasks here.

For the record, I believe that the Android app is already sending batched events, but iOS is not.

Mholloway renamed this task from Explore sending batches of events from EPC app libraries (Java, Swift) to Send batches of events from EPC app libraries (Java, Swift).Aug 2 2021, 4:35 PM

This is core functionality of the client libraries for the v1 release. I'm going to be bold and close it as I don't think it needs its own ticket.