4 monthly jobs shuffle a lot of data:
- cassandra mediarequests top files
- cassandra pageview top articles
- unique devices per project family
- unique devices per domain
When the jobs are run on their usual schedule, they have to share the cluster resources, making them naturally throttled. However this month we had to run one job at a different time and it saturated the network, impacting production traffic.
We need to throttle the jobs to make sure they don't saturate the network as for the moment we don't have QoS setup that would prioritize production traffic against analytics traffic.