Page MenuHomePhabricator

Set up HA endpoints for keystone, glance, nova, designate apis
Open, NormalPublic

Description

The openstack REST endpoints are stateless, so it should be simple to set up load-balanced pairs. As per T223902, we'll name these endpoints openstack.eqiad1.wikimediacloud.org. HAproxy should probably run on cloudcontrol1003/1004.

Event Timeline

Andrew created this task.May 20 2019, 2:15 PM
Andrew renamed this task from Set up LVS for keystone, glance, nova apis to Set up HA for keystone, glance, nova, designate apis.Jun 17 2019, 12:23 PM
Andrew reassigned this task from Andrew to JHedden.
Andrew triaged this task as Normal priority.
Andrew updated the task description. (Show Details)
Andrew renamed this task from Set up HA for keystone, glance, nova, designate apis to Set up HA endpoints for keystone, glance, nova, designate apis.Jun 17 2019, 12:26 PM
Andrew updated the task description. (Show Details)

@Andrew How many hosts will support the OpenStack APIs in your HA architecture design?

Andrew added a comment.Jul 1 2019, 7:03 PM

For the short term I've been assuming we'd just put a proxy in front of the existing endpoints. That means two each:

keystone, nova, glance, neutron: cloudcontrol1003, 1004
designate: cloudservices1001, 1002

I'm not sure if we can support HA for the dynamic proxy proxy without some more software work there but it's worth investigating.

What's the plan for managing the address HAproxy will use?

Typically this would be a VIP address managed by corosync/pacemaker that can fail over between controllers. This doesn't play well with a 2 node cluster though. Without a quorum we'd likely run into split brain issues.

Andrew added a comment.Jul 1 2019, 7:25 PM

I have never used HAproxy, so there is no plan as yet -- that's up to you :)

Getting a third cloudcontrol node is possible but will take a while for hardware requisition. We could probably just pick an arbitrary three servers to use if necessary. I'm not sure I immediately understand the split-brain danger though... there's no dynamic state being managed here, is there? Just a single front-end IP and an almost-never-changing pair of backend IPs for each service?

Bstorm added a subscriber: Bstorm.Jul 1 2019, 7:30 PM

@JHedden why not just roundrobin DNS for the haproxy instances? That's how I've managed several webapps in the past that were HA unless something is bound to an IP. I *thought* OpenStack services are all ok with DNS even when the IP changes because they are restful. If I'm misunderstanding and you mean for the haproxy backend, they'd be active/active, right (haproxy caches IP addresses)?...unless one of those services cannot be (designate?).

@Bstorm Using round robin DNS alone does not provide HA. All they do is rotate the first host address returned to the client. If the address given to the client is not reachable, the client will not ask for another address or try the other addresses in the record. This also leads to DNS caching issues when one of the addresses in the record is offline. If we were to use round robin DNS entries we'd need to ensure that all the addresses defined are reachable at all times (using something like keepalived/corosync/pacemaker.)

Typically OpenStack HA has haproxy running on each controller and a floating virtual IP address that is managed by corosync/pacemaker. This virtual IP address can only be active on one controller at a time. Without a quorum (n+1) there's no tie breaker letting the cluster know which host is in good health or not.

I'm referring to the IP address the clients will use to communicate with the OpenStack services.

Bstorm added a comment.EditedJul 1 2019, 7:55 PM

This is strictly true, yes, but that's how AWS provides load balancer HA and how lots of production web services work. Right now, we are just down until we manually move the service. If the DNS TTL is set short, it would only be down for that long (we also handle most manual failovers entirely via DNS already--again this would be superior to that method). It's not 100% foolproof, but is a gigantic leap forward from where we are. Haproxy servers would just need to be manually "depooled" from DNS before maintenance.

I will also add that corosync is used in two node clusters for NFS often. It's not great. It can cause split brain. Honestly, I think manual pool management is better when it works well, but there's that.

Bstorm added a comment.Jul 1 2019, 8:02 PM

Note: One difference is that a lot of the DNS services I used when I did this (including AWS) is that they can use healthchecking down the stack internally. I doubt we get that with ours (without looking--just assuming our templates don't allow it and potentially gdns as well).

This is strictly true, yes, but that's how AWS provides load balancer HA and how lots of production web services work. Right now, we are just down until we manually move the service. If the DNS TTL is set short, it would only be down for that long (we also handle most manual failovers entirely via DNS already--again this would be superior to that method). It's not 100% foolproof, but is a gigantic leap forward from where we are. Haproxy servers would just need to be manually "depooled" from DNS before maintenance.

The AWS load balancer (ELB) layer is what HAproxy will be providing. DNS round robin while better than no HA brings, is not a good step forward in my opinion. DNS caching issues are hard to debug and can cause services to timeout or error.

I will also add that corosync is used in two node clusters for NFS often. It's not great. It can cause split brain. Honestly, I think manual pool management is better when it works well, but there's that.

This is exactly why I'm recommending a 3 node cluster. Corosync/pacemaker works very well with 3 or more nodes.

Bstorm added a comment.Jul 1 2019, 8:04 PM

The AWS load balancer (ELB) layer is what HAproxy will be providing. DNS round robin while better than no HA brings, is not a good step forward in my opinion. DNS caching issues are hard to debug and can cause services to timeout or error.

I'm referring to the fact that you get a different IP address on every call to an AWS load balancer because it has several servers behind its name (which function just like haproxy). They basically provide a round-robin DNS on a group of haproxy servers...but they do health check them internally. When that process fails, you see it with a "mysterious" 500 on the ELB. It used to happen more often than it does now.

Bstorm added a comment.Jul 1 2019, 8:11 PM

We also literally just move DNS for lots of services because we only have pairs, not trios of hardware (such as dumps.wikimedia.org). I don't think there's a lack of desire there, just a lack of hardware.

When I did RR DNS for this with haproxy, we were using CloudFlare DNS, which *might* healthcheck and drop servers. I'm not sure it does though. It works quite well unless a server actually fails or you "depool" not so gracefully. Right now, we failover via DNS (or puppet, which is even sloppier) already, and it's entirely manual and sloppy.

I'm saying this doesn't block the benefits of being able to manage the API servers in a load balanced way. It's just not great. If we only had one single haproxy, it would be a huge improvement over what it is now.

I will say that, if we have to use cloudcontrol1003/4 for haproxy, that does recouple things doesn't it? We cannot reboot 1003 and continue to use it as a load balancer. I'm almost more concerned about that. The thing is that haproxy has to run on something.

One thing we lack is trios outside of the ceph PoC (that still isn't racked and is not a good fit for this). Most things have been designed so far with manual failover of active/passive servers in mind.

Andrew added a comment.Jul 1 2019, 8:13 PM

fwiw I'm totally down with deciding we need a third cloudcontrol. As I understand it it's only on the front-end proxy that we'd need three, not for each API backend right?

Bstorm added a comment.EditedJul 1 2019, 8:14 PM

For a stable frontend, that's the idea. Yeah. Do we have a way to do that? That'd be cool.

Andrew added a comment.Jul 1 2019, 8:18 PM

We can try to buy more hardware or we can just declare that our HA cluster is [cloudcontrol1003, cloudcontrol1004, cloudservices1003]. The puppet would be a bit ugly but they're all on public IPs.

(I'm assuming four is worse than three in this case, if not we can add cloudservices1004 as well).

Bstorm added a comment.Jul 1 2019, 8:20 PM

If we had three haproxies, like @JHedden is saying, it would be vastly superior because, like he said, you could do the recommended configs for genuine HA. They usually recommend some kind of keepalived vrrp whatnot, but there's plenty of ways to sort it out if you can make sure there's some concept of quorum. I'm just saying even the easiest junk like RR DNS is better than now, and it would be something I'd be happy about. If we *can* do better, I'd love to.

Bstorm added a comment.Jul 1 2019, 8:24 PM

I'd call 4 worse than 3, yes. Five is right out (because it's not necessary at our size).
I kind of hate putting haproxy on servers that do other things because of the entanglements that generates in maintenance, but with this setup, it could probably handle it ok. It's not like we generate the kind of traffic that would break them, and with a genuine HA service, it should be able to work it out.

bd808 added a subscriber: bd808.Jul 1 2019, 8:35 PM

I'm not sure I understand the split-brain question at all since we are trying to balance active-active endpoints in reality. Split-brain is generally only a concern for stateful services such as a database where uncoordinated state changes (i.e. writes) are harmful. For the OpenStack APIs all the state is in an external mysql instance (which is itself a SPOF today).

Haproxy+Corosync+pacemaker is a nice setup, but wouldn't haproxy+heartbeat be almost as good in our environment? It is at least a simpler stack to reason about.

Bstorm added a comment.Jul 1 2019, 8:39 PM

I was just thinking that. The supported solution for HAproxy is usually keepalived with the haproxy servers as active/passive using VRRP. The splitbrain scenario is a gratuitous ARP that will be rejected. There have been bugs where higher priority ARPs cause NOBODY to have the cluster IP, but for the most part, it's not common.

I do think that I've probably cheated wrt RR DNS in the past by using fancy DNS services that did healthchecking, though.

There's technically nothing stopping us from using 4 nodes in corosync. I'd vote for using all 4 and keeping the configuration in sync, avoiding any one off special configuration.

Another benefit of using a VIP (load balanced virtual IP) is that haproxy can bind to this interface exposing the services on their native ports. If we didn't have a VIP we'd either need to manage a custom port map for either openstack backend or frontend.

@bd808 the split brain in this design is that both hosts could bring the VIP interface online at the same time. A third node would act as a quorum, being a tie-breaker when both hosts want to be promoted to active.

Andrew added a comment.Jul 1 2019, 8:42 PM

If we want single-purpose proxies we can create ganeti VMs for that. I'm still trying to determine if we can have proper three-server redundancy that way...

Bstorm added a comment.Jul 1 2019, 8:45 PM

4 nodes is vulnerable to split brains in a less-likely scenario than a 2 node setup. Odd numbers allow a stricter quorum and are thus better.

Keepalived actually has built-in handling around ARP for two node setups.

Bstorm added a comment.Jul 1 2019, 8:49 PM

It doesn't always work perfectly, but neither does corosync/pacemaker. :)

Duh, corosync is highly configurable for voting. 4 nodes is just dandy! Disregard my nonsense about 4 nodes not being good. I was thinking of scenarios where more automated algorithms are used.

Paladox added a subscriber: Paladox.Aug 7 2019, 3:44 PM

Change 529436 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: initial haproxy profile

https://gerrit.wikimedia.org/r/529436

Change 529436 merged by Jhedden:
[operations/puppet@production] openstack: initial haproxy profile

https://gerrit.wikimedia.org/r/529436

Change 530164 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: add glance image sync to codfw

https://gerrit.wikimedia.org/r/530164

Change 530164 merged by Jhedden:
[operations/puppet@production] openstack: add glance image sync to codfw

https://gerrit.wikimedia.org/r/530164

Change 530580 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: change codfw nova api and metadata port

https://gerrit.wikimedia.org/r/530580

Change 530580 merged by Jhedden:
[operations/puppet@production] openstack: Add codfw1dev nova API and metadata to haproxy

https://gerrit.wikimedia.org/r/530580

Change 533552 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: Add codfw1dev neutron server to haproxy

https://gerrit.wikimedia.org/r/533552

Change 533552 merged by Jhedden:
[operations/puppet@production] openstack: Add codfw1dev neutron server to haproxy

https://gerrit.wikimedia.org/r/533552

Change 534680 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: Add codfw1dev glance API to haproxy

https://gerrit.wikimedia.org/r/534680

Change 534680 merged by Jhedden:
[operations/puppet@production] openstack: Add codfw1dev glance API to haproxy

https://gerrit.wikimedia.org/r/534680

Change 534832 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: add haproxy health check path support

https://gerrit.wikimedia.org/r/534832

Change 534832 merged by Jhedden:
[operations/puppet@production] openstack: add haproxy health check path support

https://gerrit.wikimedia.org/r/534832

Change 534839 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: add haproxy health check path support

https://gerrit.wikimedia.org/r/534839

Change 534839 merged by Jhedden:
[operations/puppet@production] openstack: add haproxy health check path support

https://gerrit.wikimedia.org/r/534839

Change 536664 had a related patch set uploaded (by Jhedden; owner: Jhedden):
[operations/puppet@production] openstack: configure apache wsgi for keystone api

https://gerrit.wikimedia.org/r/536664