Page MenuHomePhabricator

Bulk/Batch event endpoint
Closed, DuplicatePublic

Description

In mobile apps, it's often a good idea to locally spool events instead of sending them in real time. Doing so can help reduce battery usage and allow for event collection when there isn't an active internet connection.

The current API is designed to post a single event in real time. In mobile apps we can simply call the single event endpoint multiple times, but and endpoint that accepts multiple events would be more efficient.

For example, if we had 3 events queued up we currently have to:

GET https://meta.wikimedia.org/beacon/event?<event-json-1>
GET https://meta.wikimedia.org/beacon/event?<event-json-2>
GET https://meta.wikimedia.org/beacon/event?<event-json-2>

Preferably we cold do something like:

POST https://meta.wikimedia.org/beacon/batch-event

with body something like:

{ 
  "events": [<event-json-1>, 
             <event-json-2>,
             <event-json-3>] 
}

Related Objects

StatusSubtypeAssignedTask
ResolvedOttomata
DuplicateNone

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

+1 to this idea.

At our recent offsite, we talked a bit about having a more scalable and less fragmented event intake infrastructure, but it was mostly talk. Not sure what the priority on it would be.

Doing so can help reduce battery usage and allow for event collection when there isn't an active internet connection.

Right, this is the 'traditional' way to support offline events, since our event endpoint is fire and forget (and does not return any valid/non valid feedback) it can be extended to do this pretty well from the point of view of the client.

Now, note that given how verbose events are with a char limit of say 2000 per url this quite limits the number of events you can send, you need to code a client that is aware of this limitation. Batches cannot be of unbounded size if they are to be processed via GET.

Which is why it would be cool to have an public API with a POST endpoint :)

Overall ideas:

  • how do we handle posts? Does varnishlog have access to post body? (likely not)
  • do we need a new endpoint to consume this (outside varnish) , say node consumer that sends data to kafka (w/o schema validation but overall non-garbage validation), from that on our actual eventlogging system could do validation (events would need to be split to be processed by the regular eventlogging processor)
  • how about node hosts (scb cluster?)
  • also beta existence of this system, so people can test
Nuria triaged this task as Medium priority.Jun 8 2017, 4:18 PM

@Ottomata that was me you were talking with at the offsite earlier this year…

And yeah, sounds like a POST would be ideal here.

@Nuria let me know if I can be of any assistance on this.

@Fjalapeno : we think we might be able to start working on this possibly by end of Q2

I think we should lump this task in as a feature of the Event Data Platform program next FY

Nuria moved this task from Geowiki to Event Platform on the Analytics board.