Page MenuHomePhabricator

Explore sending batches of events from EPC libraries
Open, LowestPublic

Description

Is /v1/events plural with the intention that eventually EventGate will support batches of events in the same request?

It does support that, just POST an array.

It's really cool that this capability technically exists! This would be particularly useful for

  • mobile clients where it could mean waking up the radio fewer times, and
  • onUnload where we might want to flush the queue of logged events in a single go to make the page unload faster, rather than navigator.sendBeacon one at a time which negatively impacts next page load

We should look deeper into it, make sure we have an OK from Analytics to utilize it and that the pipeline is fully equipped to handle batches, and figure out an optimal batch size. If we end up doing this we want to get the benefit of bundling up several events into one request, but we wouldn't want to make the payload too heavy for cellular networks and slow connections in general.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
mpopov added a subscriber: Nuria.

@Nuria: would love to get your thoughts on this

@mpopov there are two things here:

  • eventgate adds the ability to put events on an array, that, per se, has no perf benefit unless a mechanism to enque the events exists in the client cause you can send an array of events 3 times in one second which will overuse the raido, makes sense?

In the eventgate pipeline the events in the array are unbundled and the pipeline can handle that w/o issues.

@Nuria: By batching I meant an array of events sent in a single request as one payload – as one batch of events. You're referring to a "burst" behavior where events are enqueued and then the queue is flushed, rapidly sending one event after another as separate requests. The patch you linked to – as I understand it and the way we've done it with EPC – is that rapid-fire, "burst" behavior.

In some situations, there might be a benefit of POSTing 1 big request containing an array (batch) of, say, 10 events, than individually POSTing 10 small requests in a rapid succession.

In some situations, there might be a benefit of POSTing 1 big request containing an array (batch) of, say, 10 events, than individually POSTing 10 small requests in a rapid succession.

In some maybe, in the majority of them, I doubt it. The bulk of your time and battery consumption in mobile goes into waking up the radio (as you mentioned earlier) which you have to do in either case. So the critical perf improvements (when it comes to the battery) are gained from walking up the radio less frequently.

rather than navigator.sendBeacon one at a time which negatively impacts next page load

This statement is , I think, incorrect. The SendBeacon API was designed for this use case in mind, to not affect the next page navigation. It comes to solve that very problem:
"By using the sendBeacon() method, the data is transmitted asynchronously to the web server when the User Agent has an opportunity to do so, without delaying the unload or affecting the performance of the next navigation". See: https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon

an optimal batch size

The current (adjustable) limits are independent of the batch array length, but are instead about byte length of individual
messages, and also the total POST body byte length.

From https://github.com/wikimedia/operations-deployment-charts/blob/master/helmfile.d/services/eqiad/eventgate-logging-external/values.yaml#L3-L8

    # The request body is accepted as an array of events, each of which
    # will be an individual message in Kafka.  Each individual
    # message must be smaller than message.max.bytes, but EventGate
    # can accept multiple events at once in the request body.
    # Limit this to a smaller size for this externally accessible eventgate instance.
    max_body_size: 4mb
# ...
        # Enforce a smaller message size limit for this externally accessible eventgate instance.
        message.max.bytes: 1048576

So with current settings, each message must be smaller than 1MB, and the entire POST body must be smaller than 4MB.

So the critical perf improvements (when it comes to the battery) are gained from walking up the radio less frequently.

@Nuria even if this is true, is there a reason not to batch events together?

fdans moved this task from Event Platform to Radar on the Analytics board.

@Nuria even if this is true, is there a reason not to batch events together?

That is fine and dandy but the main argument to be made is, I think, not a performance one.

mpopov triaged this task as Lowest priority.Dec 11 2019, 5:02 PM

Marking as lowest priority since this is not critical to any work or blocking anything. It does feel like there would be some benefit, but proper research and experimentation is needed.

Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:33 AM
Restricted Application edited projects, added Analytics; removed Analytics-Radar. · View Herald TranscriptJun 10 2020, 6:36 AM