Steps to replicate the issue / What happens?:
The v2c* backend currently has 6 VMs, on which there are 4 celery workers. The setup works fine for a few weeks/months, then workers begin to fail one by one until only one or two VMs remain working and receive all the requests, causing them to overload and crash the entire system.
What should have happened instead?:
The setup should be more stable by working for far more than a few weeks/months with workers not failing and leaving a few to become overloaded.
