Currently this project is puppetized on wikitech via https://wikitech.wikimedia.org/wiki/Hiera:Project-proxy -- I'm going to fix that /after/ this fail-over is done.
The API service mentioned below is a uwsgi service called 'invisible_unicorn'.
These steps will not result in downtime:
- Create new eqiad1 proxy nodes, proxy-01 and proxy-02
- copy certs over by hand from nova-proxy-01
- Add proxy-01 and proxy-02 to $all_proxies, let puppet update
- ensure that redis is syncing properly between regions
- Update proxy DNS record for a test proxy, ensure that proxy-01 handles it correctly
- Update proxy DNS records to point to the eqiad1 proxy (proxy-01)
- test some more
- update hieradata/eqiad/profile/openstack/main/nova/network.yaml with the new active proxy IP
These steps will result in partial downtime with creating/deleting proxies:
- Set $active_proxy to point to proxy-01, let puppet update
- stop puppet and the API on novaproxy-01
- stop api on proxy-01, restore database (it's on NFS, available to all nodes), restart API there
- Update proxy endpoints in keystone to point to the new proxy
- Test!
Cleanup:
- move project-wide puppet off of wikitech and into horizon
- Wait 24 hours for DNS caches to update
- Shut down novaproxy-01 and novaproxy-02
- Wait another few days before deleting