We have three dedicated rabbitmq nodes, and they're configured as a shared cluster with all the recommended settings for HA.
Nevertheless, HA only barely works. Often when a single rabbitmq node goes down a bunch of nova services get stuck forever pining for the missing node rather than failing over. If we restart all three nodes gradually in series, a bunch of messages get lost even though the whole idea of a rabbitmq cluster is that the messages are mirrored persistent.
This needs a deep dive with test cases and research to figure out what's going on and how to stabilize things.