Page MenuHomePhabricator

Implement throttling strategy for events that have a too high of a throughput.
Open, Needs TriagePublic

Description

Implement throttling strategy for events that have a too high of a throughput.

In the last quarter we have had couple events in which clients of EL log too much data due to a sampling missconfiguration or a client side bug. We detect these events with the alarms but regardless we process the data and inserting into the database. In most instances the sampling missconfiguration produces very large tables that cannot even be used to extract data from.

We should determine the rate at which schemas can produce events. i.e. the overall ratio of event creation that we can sustain optimally from the database storage perspective. If the throughput rate goes over this ratio we should throttle all schemas.

Also, if a schema suddenly overpowers the rest we should throttle that schema accordingly.

This script can be used to get a breakdown of events in vanadium:
https://gist.github.com/atdt/8deed4bc2d311ba0122f#file-el-status-py


Version: unspecified
Severity: normal

Details

Reference
bz67470