The current concept of groups in rdbms load balancer is... bespoke. I have not seen it anywhere in the industry and it doesn't really isolate the load since DBAs have always been pooling api group replicas in the general group. One section (s3) has actually been operating without an api group for years now. The whole concept is causing all sorts of confusion and headache for us (DBAs) and it can hurt accuracy of the section-wide circuit breakers. Also, it is not clear what a group should refer to. api group is for API requests but vslow group is for slow queries in any type of requests.
My proposal is to use a more standard approach and simplify configuration:
- Make rdbms library take a unique identifier, for users it can be something with session id, for logged out users it can be their IP.
- Use that identifier and Amazon's shuffle sharding idea to pick the replica:
- Use the identifier to pick two replicas out of the pool.
- Based on weights, pick one of them
- Keep vslow/dump group but use it as a flag and no need to explicitly define a replica in the config. Let rdbms with some magic consistently designate a replica as vslow.
- Keep LoadMonitor, make sure the weights are properly reflected.