Conclusions
The easiest solution from a development perspective would be to use Redis Cluster, which is available in Redis >= 3.0.0 and is recommended by the Redis authors as the best current practice. It transparently supports sharding, automatic failover, and node migration. Unfortunately, this has not yet landed in Ubuntu's stable distro, and I can't find any examples of successful production deployment, so it's out of the question for now.
Instead, we're going to do a manual sharding thing and simple master-slave replication. The topology will be like,
- Payments 1001-1003: Each run a Redis master.
- Payments 1004: Runs three Redis slaves, one for each master.
The following changes will need to be made to client code:
- The DonationInterface frontends will read and write to Redis on localhost, each box maintaining its own limbo. If the completed transaction hook cannot find a limbo entry for the donation being processed, check whether the data is available on any of the three slaves. Failure to pop from the FIFO queue during transaction completion is fine, the orphan slayer is already robust against that.
- The orphan slayer will pop from each master, going round-robin and taking one message from each queue.
- The slayer should probably only use slaves for actual read operations (getting limbo data), and not substitute a slave read for a master pop (pull and dequeue most recent record) like we do from the frontend, otherwise we'll have to come up with a mechanism to prevent repeatedly chasing our tail.