Page MenuHomePhabricator

Write a simple script that handles failovering proxies (or move behind HA proxy!)
Closed, ResolvedPublic

Description

Right now, the way to failover tools / nova proxy is:

  1. Get on Horizon
  2. De-allocate IP from active
  3. Re-allocate to non-active, hope it gets the same

This is untenable. Instead we should have a script (on labcontrol1001, unfortunately) that when run does the following:

  1. Prints the active proxy's name, asks for confirmation of failover
  2. When confirmed, uses the nova api to switch the floating IP from active to passive

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

While switching over floating ips using openstack floating ip add, remember to prefix with OS_TENANT_NAME=<tenant-name>, since it will fail without it.

Adding without OS_TENANT_NAME fails:

root@labcontrol1001:~# openstack ip floating add 208.80.155.166 d160b598-df1e-4ffa-b9e4-7e38d0a27439
Unable to associate floating ip 208.80.155.166 to fixed ip 10.68.17.208 for instance d160b598-df1e-4ffa-b9e4-7e38d0a27439. Error: Floating IP 208.80.155.166 association has failed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 447, in _object_dispatch
    return getattr(target, method)(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 171, in wrapper
    result = fn(cls, context, *args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/objects/floating_ip.py", line 115, in associate
    host)

  File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 379, in floating_ip_fixed_ip_associate
    host)

  File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 216, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 146, in wrapper
    ectxt.value = e.inner_exc

  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 136, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 884, in floating_ip_fixed_ip_associate
    raise exception.FloatingIpAssociateFailed(address=floating_address)

FloatingIpAssociateFailed: Floating IP 208.80.155.166 association has failed.

Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
    executor_callback))

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
    executor_callback)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch
    result = func(ctxt, **new_args)

  File "/usr/lib/python2.7/dist-packages/nova/network/floating_ips.py", line 406, in _associate_floating_ip
    do_associate()

  File "/usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 254, in inner
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/network/floating_ips.py", line 375, in do_associate
    fixed_address, self.host)

  File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 169, in wrapper
    args, kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 229, in object_class_action
    args, kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/conductor/rpcapi.py", line 237, in object_class_action_versions
    args=args, kwargs=kwargs)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
    retry=self.retry)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
    timeout=timeout, retry=retry)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 462, in send
    retry=retry)

  File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 453, in _send
    raise result

FloatingIpAssociateFailed_Remote: Floating IP 208.80.155.166 association has failed.
Traceback (most recent call last):

  File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 447, in _object_dispatch
    return getattr(target, method)(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 171, in wrapper
    result = fn(cls, context, *args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/objects/floating_ip.py", line 115, in associate
    host)

  File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 379, in floating_ip_fixed_ip_associate
    host)

  File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 216, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 146, in wrapper
    ectxt.value = e.inner_exc

  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
    six.reraise(self.type_, self.value, self.tb)

  File "/usr/lib/python2.7/dist-packages/oslo_db/api.py", line 136, in wrapper
    return f(*args, **kwargs)

  File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 884, in floating_ip_fixed_ip_associate
    raise exception.FloatingIpAssociateFailed(address=floating_address)

FloatingIpAssociateFailed: Floating IP 208.80.155.166 association has failed.
 (HTTP 400) (Request-ID: req-f37f0590-af4d-407c-9cb8-bfb8a77a318f)

Adding with OS_TENANT_NAME succeeds:

root@labcontrol1001:~# OS_TENANT_NAME=tools openstack ip floating add 208.80.155.166 d160b598-df1e-4ffa-b9e4-7e38d0a27439

Removing without OS_TENANT_NAME succeeds(??!!):

root@labcontrol1001:~# openstack ip floating remove 208.80.155.166 d160b598-df1e-4ffa-b9e4-7e38d0a27439

Right now, the way to failover tools / nova proxy is:
Get on Horizon
De-allocate IP from active
Re-allocate to non-active, hope it gets the same

Project -> Compute -> Access & Security -> Floating IPs should let you disassociate/re-associate a specific IP

scfc triaged this task as Low priority.Feb 16 2017, 8:57 PM
scfc moved this task from Backlog to Ready to be worked on on the Toolforge board.

This only barely warrants a script, since I just now did it with a single command:

OS_TENANT_NAME=project-proxy openstack ip floating add 208.80.155.156 9930528e-c824-48e0-932e-862ffe582412

Apparently you don't need to do a 'remove' beforehand, nova just copes.

I would see the benefit of having a script if it wasn't a 2-click process on Horizon but having a script is still a manual process (plus adding unit/integration tests, packaging, CI, etc). As the saying goes, the best code is no code at all.

A better situation would be if we had an elastic load balancing layer that adjusted automatically but we're a bit far from having that right now (maybe with future tasks on Kubernetes and ingress controllers or even mesh networks we could find a better solution).

Since it seems we're fine with the manual process, I'll close this task. As always, feel free to open if I misunderstood the situation.

This only barely warrants a script, since I just now did it with a single command:

OS_TENANT_NAME=project-proxy openstack ip floating add 208.80.155.156 9930528e-c824-48e0-932e-862ffe582412

Apparently you don't need to do a 'remove' beforehand, nova just copes.

We should make sure this is documented in the runbooks for admin tasks on wikitech.

In the past, I've found that not removing beforehand can break things. It didn't always work well. My question here also is whether neutron handles that better. I think we should reconsider that edit just slightly to include the remove, just in case. I believe I discovered this on the static server failovers.

I think the main concern about remove + add is getting the same floating ip from the pool. It would be great to have more clarity about this as the reports on this task alone are mixed. I'm going to reopen this for now as a result.

Andrew renamed this task from Write a simple script that handles failovering proxies to Write a simple script that handles failovering proxies (or move between HA proxy!).Mar 10 2020, 4:56 PM
Andrew renamed this task from Write a simple script that handles failovering proxies (or move between HA proxy!) to Write a simple script that handles failovering proxies (or move behind HA proxy!).Mar 10 2020, 4:59 PM

At this point, we have a PoC over in the PAWS project. If we can demonstrate good behavior in a quick failover test, we could start to deploy the functionality more widely. There's a small routing issue to handle first brought up in T257534

aborrero claimed this task.
aborrero subscribed.

At this point, we have a PoC over in the PAWS project. If we can demonstrate good behavior in a quick failover test, we could start to deploy the functionality more widely. There's a small routing issue to handle first brought up in T257534

We deployed haproxy in front of the PAWS ingress. It works great (so far). We decided to close this task and open a new one when we are ready to start working on a more general solution for other front proxies.