payments-listener server high availability
The payments-listener service is hosted on a single machine eqiad with a standby machine at codfw, switching to the backup in event of failure is a two step process--the codfw machine needs to be flipped in puppet out of 'maintenance mode' and the DNS record for gets changed. The payments-listener service can tolerate outages, it's an API for callbacks from payment providers and the providers will keep retrying if it's down.

We host a second site,, on the same webserver. The site is very simple--all it does is redirects in nginx. This site is user-facing because it hosts redirects for URLs sent to potential donors to signup for donor events. So we should improve our reliability standards for this webserver.

The way we've handled this in other cases (payments) is to add one or more webservers, and put a pair of pybal/LVS servers in front of them in an LVS-DR configuration. That's a large investment considering how little these services do. Also we need to look at how the payments-listener will behave if LVS shifts traffic across webservers mid-session.

Historically we would use pybal/LVS-DR for this, but I think it would be simpler and more efficient to use something like Bird to make a pair of webservers advertise themselves by BGP as routes to a VIP bound to loopback. The end effect would be similar to what we do with LVS-DR, but without the separate load balancers. If we can make this work, I would also like to use it to deprecate the pay-lvs servers.

Arzhel has already started implementing this strategy for things like DNS servers, and has a bunch of work awaiting code review.

(12:45:13 PM) XioNoX: We're now using it in prod, with doc on

