Page MenuHomePhabricator

Increase EventLogging limit from 2K to 5K
Closed, DeclinedPublic

Description

In 2015, we increased the limit from 1000 to 2000 (T112002).

I think it's time we look ahead at potentially increasing this to 5000 sometime soon. This is not yet a priority given our current schemas are (mostly) consistently within the limits, but a few use cases are slowly emerging that might benefit from a higher limit.

This can sit in the backlog as thinking deposit until we need it.

Relevant bits for implementation

  • The storing, validating and processing of events in Kafka and in EventLogging's Python code has no known limitations.
  • The Nginx TLS-proxy has limits the URI to 16K (large_client_header_buffers)
  • The ingestion point in Varnish has a limit of 2048 bytes (vsl_reclen).
  • The client-side has a self-imposed limit of 2000 characters with StatsD logging if it exceeds that limit. (ext.eventLogging)

Potential concerns

Browsers are believed to support without issue beacons having a url of 5000 bytes in size. This means an increase from 2K to 4K shouldn't cause any problems (e.g. with the beacon being internally seen as dispatched but ultimately not being sent or corrupted without our client knowing this).

In 2016, Performance Team previously researched this in the context of load.php urls ($wgResourceLoaderMaxQueryLength) which we have raised from 2000 to 5000 in WMF production. The bottleneck there was IE 9, which supported only up to 5000 characters for the query string. Before that, the bottle neck was IE 8 (limited to 2000 chars), for which JavaScript support was dropped (MediaWiki actively disables JS in IE 8).

Given that IE 9 was the bottle neck at the time, and we have since discontinued JavaScript support for IE 9 and IE 10, we may be able to go beyond 5000, from the browser's perspective anyway.

Most web servers and proxies are believed to support urls of 5000 bytes or longer without issue. At WMF at least, we know load.php urls with 5000 characters work fine (Nginx, Varnish, Apache, HHVM fcgi).

Lastly, EventLogging itself currently receives beacons via varnishkafka from Varnish-SHM which has a configurable limit we currently set to 2048 bytes. This would need to be raised.

Event Timeline

Krinkle created this task.Oct 30 2018, 12:56 AM
Restricted Application added a project: SRE. · View Herald TranscriptOct 30 2018, 12:56 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Krinkle renamed this task from Increase EventLogging limit from 2K to 4K to Increase EventLogging limit from 2K to 5K.Oct 30 2018, 12:58 AM
Krinkle updated the task description. (Show Details)
Tgr added a subscriber: Tgr.Oct 30 2018, 11:11 PM
ema triaged this task as Medium priority.Oct 31 2018, 7:49 AM

If this is possible we don't see any reason why not! I don't think there is anything on the analytics backend side that would need to be changed; just the VSL settings and the EL extension. If traffic team is cool with it please proceed!

Ottomata moved this task from Incoming to Radar on the Analytics board.Nov 1 2018, 4:29 PM

*i think* we have tried to change this limit before to no avail as the limit as it stands now is due (if i understood past readings) to the tricky balance of varnish in-memory buffers, now, this was some years back and changes in varnish might have happened such we feel we can now increase those limits, I guess @BBlack might be able to speak to that. Seems not a trivial change.
See: https://phabricator.wikimedia.org/T91347

If the intent is to be able to sent more data via events this is probably better done using post. This is something that is being looked up on the Modern Data Platform work.

ema moved this task from Triage to General on the Traffic board.Nov 15 2018, 10:07 AM
Ottomata closed this task as Declined.Jun 26 2019, 3:23 PM

Modern Event Platform's EventGate will support larger events in POST bodies.

Aklapper edited projects, added Analytics-Radar; removed Analytics.Jun 10 2020, 6:44 AM